Re: [DISCUSS] CEP-26: Unified Compaction Strategy

2022-12-20 Thread Henrik Ingo
I noticed the CEP doesn't link to this, so it should be worth mentioning
that the UCS documentation is available here:
https://github.com/datastax/cassandra/blob/ds-trunk/doc/unified_compaction.md

Both of the above seem to do a poor job referencing the literature we've
been inspired by. I will link to Mark Callaghan's blog on the subject:

http://smalldatum.blogspot.com/2018/07/tiered-or-leveled-compaction-why-not.html?m=1

...and lazily will also borrow from Mark a post that references a bunch of
LSM (not just UCS related) academic papers:
http://smalldatum.blogspot.com/2018/08/name-that-compaction-algorithm.html?m=1

Finally, it's perhaps worth mentioning that UCS has been in production in
our Astra Serverless cloud service since it was launched in March 2021. The
version described by the CEP therefore already incorporates some
improvements based on observed production behaviour.

Henrik

On Mon, 19 Dec 2022, 15:41 Branimir Lambov,  wrote:

> Hello everyone,
>
> I would like to open the discussion on our proposal for a unified
> compaction strategy that aims to solve well-known problems with compaction
> and improve parallelism to permit higher levels of sustained write
> throughput.
>
> The proposal is here:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
>
> The strategy is based on two main observations:
> - that tiered and levelled compaction can be generalized as the same thing
> if one observes that both form exponentially-growing levels based on the
> size of sstables (or non-overlapping sstable runs) and trigger a compaction
> when more than a given number of sstables are present on one level;
> - that instead of "size" in the description above we can use "density",
> i.e. the size of an sstable divided by the width of the token range it
> covers, which permits sstables to be split at arbitrary points when the
> output of a compaction is written and still produce a levelled hierarchy.
>
> The latter allows us to shard the compaction space into
> progressively higher numbers of shards as data moves to the higher levels
> of the hierarchy, improving parallelism, space requirements and the
> duration of compactions, and the former allows us to cover the existing
> strategies, as well as hybrid mixtures that can prove more efficient for
> some workloads.
>
> Thank you,
> Branimir
>


Re: [DISCUSS] Taking another(other(other)) stab at performance testing

2023-01-07 Thread Henrik Ingo
k site:
> https://home.apache.org/~mikemccand/lucenebench/indexing.html
>
> I've checked in with a handful of performance minded contributors in early
> December and we came up with a first draft, then some others of us met on
> an adhoc call on the 12/9 (which was recorded; ping on this thread if you'd
> like that linked - I believe Joey Lynch has that).
>
> Here's where we landed after the discussions earlier this month (1st page,
> estimated reading time 5 minutes):
> https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit#
>
> Curious to hear what other perspectives there are out there on the topic.
>
> Early Happy New Years everyone!
>
> ~Josh
>
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] Taking another(other(other)) stab at performance testing

2023-01-10 Thread Henrik Ingo
Since I cited several papers in my essay below, I might as well add the
latest one, which describes our use of automatic change point detection
inside Datastax. We've indirectly been testing Cassandra 4.0 already over a
year with this method, as we use change detection against an internal fork
of 4.0.

https://arxiv.org/abs/2301.03034

henrik

On Sun, Jan 8, 2023 at 5:12 AM Henrik Ingo  wrote:

> Hi Josh, all
>
> I'm sitting at an airport, so rather than participating in the comment
> threads in the doc, I will just post some high level principles I've
> derived during my own long career in performance testing.
>
> Infra:
>  - It's a common myth that you need to use on premise HW because cloud HW
> is noisy.
>  - Most likely the opposite is true: A small cluster of lab hardware runs
> the risk of some sysadmin with root access manually modifying the servers
> and leave them in an inconsistent configuration. Otoh a public cloud is
> configured with infrastructure as code, so every change is tracked in
> version control.
>  - Four part article on how we tuned EC2 at my previous employer: 1
> <https://www.mongodb.com/blog/post/reducing-variability-performance-tests-ec2-setup-key-results>,
> 2
> <https://www.mongodb.com/blog/post/repeatable-performance-tests-ec2-instances-neither-good-nor-bad>,
> 3
> <https://www.mongodb.com/blog/post/repeadtable-performance-tests-ebs-instances-stable-option>
> , 4
> <https://www.mongodb.com/blog/post/repeatable-performance-tests-cpu-options-best-disabled>
> .
>  - Trust no one, measure everything. For example, don't  trust that what
> I'm writing here is true. Run sysbench against your HW, then you have first
> hand observations.
>  - Specifically using EC2 has an additional benefit that the instance
> types can be considered well known and standard HW configurations more than
> any on premise system.
>
> Performance testing is regression testing
>  - Important: Run perf tests with the nightly build. Make sure your HW
> configuration is repeatable and low variability from day to day.
>  - Less important / later:
>  - Using complciated benchmarks (tpcc...) that try to model a real
> world app. These can take weeks to develop, each.
>  - Having lots of different benchmarks for "coverage".
>  - Adding the above two together: Running a simple key-value test (e.g.
> YCSB) every night in an automated and repeatable way, and storing the
> result - whatever is considered relevant - so that you end up with a
> timeseries is a great start and I'd take this over that complicated
> "representative" benchmark any day.
>  - Use change detection to automatically and deterministically flag
> statistically significant change points (regressions).
>  - Literature: detecting-performance-regressions-with-datastax-hunter
> <https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4>
> ,
>  - Literature: Fallout: Distributed Systems Testing as a Service
> <https://www.semanticscholar.org/paper/0cebbfebeab6513e98ad1646cc795cabd5ddad8a>
>  Automated system performance testing at MongoDB
> <https://www.connectedpapers.com/main/0cebbfebeab6513e98ad1646cc795cabd5ddad8a/graph>
>
>
> Common gotchas:
>  - Testing with a small data set that fits entirely in RAM. A good dataset
> is 5x the RAM available to the DB process. Or you just test with the size a
> real production server would be running - at Datastax we have tests that
> use a 1TB and 1.5TB data set, because those tend to be standard maximum
> sizes (per node) at customers.
>  - The test runtime is too short. IT depends on the database what is a
> good test duration. The goal is to reach stable state. But for an LSM
> database like Cassandra this can be hard. For other databases I worked
> with, the default is typically to flush every 15 to 60 seconds, and the
> test duration should be a multiple of those (3 to 5 min).
>  - Naive comparisons to determine whether a test result is a regression or
> not. For example benchmarking the new release against the stable version,
> one run each, and reporting the result as "fact". Or comparing today's
> result with yesterday's.
> '
>
> Building perf testing systems following the above principles have had a
> lot of positive impact in my projects. For example, at my previous employer
> we caught 17 significant regressions during the 1 year long development
> cycle of the next major version. (see my paper above)  Otoh after the GA
> release, during the next year users only reported 1 significant performance
> regression. That is to say, the perf testing of nightly builds caught all
> but one regr

Re: Intra-project dependencies

2023-01-16 Thread Henrik Ingo
ne of the primary ones git submodules are
> designed to address.
>
> On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:
>
>
> I guess option 5 is what we have today in cep-15, have the build file grab
> the relevant SHA for the library. This way you maintain a precise SHA for
> builds and scripts don’t have to be modified.
>
> I believe this is also possible with git submodules, but I’m happy to bake
> this into our build file instead with a script.
>
> > As the library itself no longer has an explicit version, what I presume
> you meant by logical version.
>
> I mean that we don’t want to duplicate work and risk diverging
> functionality maintaining what is logically (meant to be) the same code. As
> a developer, managing all of the branches is already a pain. Libraries
> naturally have a different development cadence to the main project, and
> tying the development to C* versions is just an unnecessary ongoing burden
> (and risk) that we can avoid.
>
> There’s also an additional penalty: we reduce the likelihood of outside
> contributions to the libraries only. Accord in particular I hope will
> attract outside interest if it is maintained as a separate library, as it
> has broad applicability, and is likely of academic interest. Tying it to C*
> version and more tightly coupling with C* codebase makes that less likely.
> We might also see folk interested in our utilities, or our simulator
> framework, if they were to be maintained separately, which could be
> valuable.
>
>
>
>
> On 16 Jan 2023, at 10:49, Mick Semb Wever  wrote:
>
> 
>
> I think (4) is the only sensible option. It permits different development
> branches to easily reference different versions of a library and also to
> easily co-develop them - from within the same IDE project, even.
>
>
>
> I've only heard horror stories about submodules. The challenges they bring
> should be listed and checked.
>
> Some examples
>  - you can no longer just `git clone …`  (and we clone automatically in a
> number of places)
>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>  - permanence from a git SHA no longer exists
>  - our releases get more complicated (our source tarballs are the asf
> releases)
>  - handling patches cover submodules
>  - switching branches, and using git worktrees, during dv
>
> I see (4) as a valid option, but concerned with the amount of work
> required to adapt to it, and whether it will only make it more complicated
> for the new contributor to the project. For example the first two points
> are addressed by remembering to do `git clone --recurse-submodules …` . And
> who would be fixing our build/test/release scripts to accommodate?
>
> Not blockers, just concerns we need to raise and address.
>
>
>
> We might even be able to avoid additional release votes as a matter of
> course, by compiling the library source as part of the C* release, so that
> they adopt the C* release vote (or else we may periodically release the
> library as we do other releases)
>
>
>
> Yes. Today we do a combination of first (3) and then (1). Having to make a
> release of these libraries every time a patch (/feature branch) is
> completing is a horror story in itself.
>
>
> I might be missing something, does anyone have any other bright ideas for
> approaching this problem? I’m sure there are plenty of opinions out there.
>
>
>
> Looking at the problem with these libraries,
>  - we don't need releases
>  - we don't have a clean version/branch parity to in-tree
>  - codebase parity between branches is important for upgrade tests (shared
> classloaders)
>
>  For (2) you mention drift of the "same" version, isn't this only a
> problem for dtest-api in the way it requires the "same version" of a
> codebase for compatibility when running upgrade tests? As the library
> itself no longer has an explicit version, what I presume you meant by
> logical version.
>
> To begin with, I'm leaning towards (2) because it is a cognitive re-use of
> our release branches, and the problems around classpath compatibility can
> be solved with tests. I'm sure I'm not seeing the whole picture though…
>
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Intra-project dependencies

2023-01-16 Thread Henrik Ingo
ith the caveat that I haven't worked w/submodules before and only know
>> about them from a cursory search, it looks like git-submodule status would
>> show us the sha for submodules and we could have parent projects reference
>> specific shas to pull for submodules to build?
>> https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203
>> <https://urldefense.com/v3/__https://git-scm.com/docs/git-submodule/*Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203__;Iw!!PbtH5S7Ebw!fsPjRP4hKq0en0Jh6A9uUnXA5lITeY3LIkXYEZZg_0SweveVOQvRg-z1CIxAexTWI6blxLaoo5SIDnMCSaOsnw$>
>>
>> It seems like our use case is one of the primary ones git submodules are
>> designed to address.
>>
>> On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:
>>
>>
>> I guess option 5 is what we have today in cep-15, have the build file
>> grab the relevant SHA for the library. This way you maintain a precise SHA
>> for builds and scripts don’t have to be modified.
>>
>> I believe this is also possible with git submodules, but I’m happy to
>> bake this into our build file instead with a script.
>>
>> > As the library itself no longer has an explicit version, what I
>> presume you meant by logical version.
>>
>> I mean that we don’t want to duplicate work and risk diverging
>> functionality maintaining what is logically (meant to be) the same code. As
>> a developer, managing all of the branches is already a pain. Libraries
>> naturally have a different development cadence to the main project, and
>> tying the development to C* versions is just an unnecessary ongoing burden
>> (and risk) that we can avoid.
>>
>> There’s also an additional penalty: we reduce the likelihood of outside
>> contributions to the libraries only. Accord in particular I hope will
>> attract outside interest if it is maintained as a separate library, as it
>> has broad applicability, and is likely of academic interest. Tying it to C*
>> version and more tightly coupling with C* codebase makes that less likely.
>> We might also see folk interested in our utilities, or our simulator
>> framework, if they were to be maintained separately, which could be
>> valuable.
>>
>>
>>
>>
>> On 16 Jan 2023, at 10:49, Mick Semb Wever  wrote:
>>
>> 
>>
>> I think (4) is the only sensible option. It permits different development
>> branches to easily reference different versions of a library and also to
>> easily co-develop them - from within the same IDE project, even.
>>
>>
>>
>> I've only heard horror stories about submodules. The challenges they
>> bring should be listed and checked.
>>
>> Some examples
>>  - you can no longer just `git clone …`  (and we clone automatically in a
>> number of places)
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>  - permanence from a git SHA no longer exists
>>  - our releases get more complicated (our source tarballs are the asf
>> releases)
>>  - handling patches cover submodules
>>  - switching branches, and using git worktrees, during dv
>>
>> I see (4) as a valid option, but concerned with the amount of work
>> required to adapt to it, and whether it will only make it more complicated
>> for the new contributor to the project. For example the first two points
>> are addressed by remembering to do `git clone --recurse-submodules …` . And
>> who would be fixing our build/test/release scripts to accommodate?
>>
>> Not blockers, just concerns we need to raise and address.
>>
>>
>>
>> We might even be able to avoid additional release votes as a matter of
>> course, by compiling the library source as part of the C* release, so that
>> they adopt the C* release vote (or else we may periodically release the
>> library as we do other releases)
>>
>>
>>
>> Yes. Today we do a combination of first (3) and then (1). Having to make
>> a release of these libraries every time a patch (/feature branch) is
>> completing is a horror story in itself.
>>
>>
>> I might be missing something, does anyone have any other bright ideas for
>> approaching this problem? I’m sure there are plenty of opinions out there.
>>
>>
>>
>> Looking at the problem with these libraries,
>>  - we don't need releases
>>  - we don't have a clean version/branch parity to in-tree
>>  - codebase parity between branches is important for upgrade tests
>> (shared classloaders)
>>
>>  For (2

Re: Intra-project dependencies

2023-01-17 Thread Henrik Ingo
orwarding the submodule to HEAD's SHA breaks things,
> do you now have to fix that or introduce branching in the submodule? If the
> submodule doesn't have releases, is it doing versioning, and if not how are
> branches distinguished?
> > >
> > > Arn't these all fair enquiries to raise?
> > >
> > >> - you need to be making commits to all branches (and forward merging)
> anyway to update submodule SHAs,
> > >>
> > >>
> > >> Exactly as you would any library upgrade?
> > >
> > >
> > >
> > > Correct. submodules does not solve/remove the need to commit to
> multiple branches and forward merge.
> > > Furthermore submodules means at least one additional commit, and
> possibly twice as many commits.
> > >
> > >
> > >> - if development is active on trunk, and then you need an update on
> an older branch, you have to accommodate to backporting all those trunk
> changes (or introduce the same branching in the submodule),
> > >>
> > >>
> > >> If you do feature development against Accord then you will obviously
> branch it? You would only make bug fixes to a bug fix branch. I’m not sure
> what you think is wrong here.
> > >
> > >
> > >
> > > That's not obvious, you stated that a goal was to avoid maintaining
> multiple branches. Sure there's benefits to a lazy branching approach, but
> it contradicts your initial motivations and introduces methodology changes
> that are worth pointing out. What happens when there are multiple consumers
> of Accord, and (like the situation we face with jamm) its HEAD is well in
> front of anything C* is using.
> > >
> > > As Henrik states, the underlying problem doesn't change, we're just
> choosing between trade-offs. My concern is that we're not even doing a very
> good job of choosing between the trade-offs. Based on past experiences with
> submodules: that started with great excitement and led to tears and
> frustration after a few years; I'm only pushing for a more thorough
> discussion and proposal.
> > >
> > >
> > >
> > >
> >
> >
> > --
> > +---+
> > | Derek Chen-Becker |
> > | GPG Key available at
> https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!elHEL4NgggbLsBElKkANZ1KhFuuOpnWrUZe8RESgdwNEkPdmWuw_JQEVFlxZbPTkq7FsCt5BAn_x5pBGaqIJig$
> and   |
> > |
> https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!elHEL4NgggbLsBElKkANZ1KhFuuOpnWrUZe8RESgdwNEkPdmWuw_JQEVFlxZbPTkq7FsCt5BAn_x5pBWlY_cFw$
> |
> > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> > +---+
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at
> https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!elHEL4NgggbLsBElKkANZ1KhFuuOpnWrUZe8RESgdwNEkPdmWuw_JQEVFlxZbPTkq7FsCt5BAn_x5pBGaqIJig$
> and   |
> |
> https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!elHEL4NgggbLsBElKkANZ1KhFuuOpnWrUZe8RESgdwNEkPdmWuw_JQEVFlxZbPTkq7FsCt5BAn_x5pBWlY_cFw$
> |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Intra-project dependencies

2023-01-17 Thread Henrik Ingo
On Tue, Jan 17, 2023 at 11:40 PM Mick Semb Wever  wrote:

>
>> It introduces some overhead when bisecting to go from the snapshot's
> timestamp to the other repo's SHA (this is easily solvable by putting the
> SHA inside the jarfile).
>

Whatever system we choose, the link should be the SHA. It shouldn't be
necessary for a human to lookup the necessary parameters based on some
mapping to other parameters.

A basic first order requirement: Restarting an old build in Jenkins should
rerun the exact same version of all modules.

More complex requirement: Starting a Jenkins build from cassandra commit
123abc, should checkout/download/use the correct versions of  other modules
at the time 123abc was committed.

henrik


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Intra-project dependencies

2023-01-20 Thread Henrik Ingo
Thanks Mick and David. I've been following this silently for a few days
because we already exhausted my knowledge on the topic. But it seems your
collective knowledge is uncovering a nice solution.

If I summarize, I like all of this:

- link to SHA, not library version
- use git submodules because that's what they are meant to be used for/
it's standard
- use git hooks to automate the otherwise annoying ux of submodules
- use gradle to automate the installation of the hooks (note: imo must ask
user for explicit permission)
- whether or not user installed the hooks, build system by default should
check and fail to work with wrong sha in any submodule. But allow overrides.
- The build system, source tarball etc should consider the submodules as
just being a directory in the source tree. Things should work the same
whether you are in a git checkout or source tarball.

Henrik

On Fri, 20 Jan 2023, 02:54 David Capwell,  wrote:

> Thanks for the reply, my replies are inline to your inline replies =D
>
> On Jan 19, 2023, at 2:39 PM, Mick Semb Wever  wrote:
>
>
> Thanks David for the detailed write up. Replies inline…
>
>
>
>> We tried in-tree for in-jvm dtest and found that this broke every other
>> commit… maintaining the APIs across all our supported branches was too hard
>> to do and moving it outside of the tree helped make the upgrade tests more
>> stable (there were breakage but less frequent)….
>>
>
>
> The in-jvm dtest-api library is unique in this way. I would not use it as
> reasoning that other libraries should not be in-tree.
>
>
> Fair, its the only API we have that is required to be byte code compatible
> cross versions/builds; this unique property may argue for different
> solutions than others
>
>
>
>
>
>> We tried to do snapshot builds where the version contained the SHA, but
>> this has the issue that snapshot builds “may” go away over time and made
>> older SHAs no longer building…
>>
>
>
> Only keeping the last snapshot in repository.a.o is INFRA's policy (i've
> found out).
> We can ask INFRA to set up a separate snapshots repository just for us,
> with a longer expiry policy. I'd rather not create extra work for infra if
> there's other ways we can do this, and this approach would always require
> some fallback approach to rebuilding the dedepency's SHA from scratch.
>
>
> If they will allow this and allow the snapshots to never be purged, then I
> am ok with this as a solution.
>
>
>
>
>
>> We break python-dtest when cross-cutting changes are added as CI is hard
>> to do correctly or not supported (testing downstream users (our 4 supported
>> branches) is rarely done).
>>
>
>
> python dtests' is also in a different category, (the context and
> consumption in a different direction, i.e. it's not a library used within
> the in-tree).
>
>
> I disagree.  The point I was making is we have several dependencies and we
> should think about how we maintain them.  My point is still valid that
> python dtests are involved with cross cutting changes to Cassandra, and the
> lack of downstream testing has broken us several times.  The solution to
> this problem may be different than Accord (as C* doesn’t depend on python
> dtest as you point out), but that does not mean we shouldn’t think about it
> in this conversation….
>
> One thing that comes to mind is that dependencies may benefit from running
> a limited C* CI as part of their merge process.  At the moment people are
> expected to create a tmp CI branch for all 4 supported C* versions, point
> it to the python dtest change, then submit to the JIRA as proof that CI was
> ran… normally when I find python dtest broke in branch X I find this had
> not happened…
>
> This holds true I believe for JVM dtest as well as we should be validating
> that the 4 target C* branches still work if you are touching jvm dtest…
>
> Now, with all that, Accord being external will have similar issues, a
> change there may break Cassandra so we should include a subset of Cassandra
> tests in Accord’s CI.
>
>
>
>
>> * [nice to have] be able to work with all subprojects in one IDE and not
>> have to switch between windows while making cross-cutting changes
>>
>
>
> Isn't it only IntelliJ that suffers this problem? (That doesn't invalidate
> it, just asking…)
>
>
> I have not used Eclipse or NetBeans for around 10 years so no clue!
>
>
>
>
>>  - same with `git pull …` (easy to be left with out-of-sync submodules)
>>
>>
>> Correct, if you use submodules/script you have a text file saying what we
>> “should” use, but this does not enforce actually using them… again we could
>> make sure build.xml does the right thing,
>>
>
>
> If we try this approach out, I'm definitely in favour of any build.xml
> command immediately failing if `git submodule status` != `git submodule
> status --cached`
>
>
> +1
>
>
>
>
>> but this can be confusing for people who mainly build in IDE and don’t
>> depend on build.xml until later in development… this is something we should
>> think about…
>>
>

Re: Merging CEP-15 to trunk

2023-01-20 Thread Henrik Ingo
I might be completely off, but I think what others are referring to here is
that 2 committers is the minimum bar, and for any commit there could be
other contributors wishing to review some part or even in full what is
being merged, and we would always allow for that, within reasonable time
limits.

Since most contributors would not have paid attention to a feature branch,
the result is that that additional review happens now. If it happens / if
anyone is interested. But if nobody expresses any concerns or asks for time
to look into something specific, then I agree that the reviews that have
already happened in the feature branch are sufficient and  there isn't a
need for a new full blown review.

As far as I can tell, this email thread is exactly that process and I
imagine was at least one of the reasons to send this heads up email?

henrik

On Fri, Jan 20, 2023 at 5:23 PM Aleksey Yeshchenko 
wrote:

> What Benedict says is that the commits into cassandra/cep-15-accord and
> cassandra-accord/trunk branch have all been vetted by at least two
> committers already. Each authored by a Cassandra committer and then
> reviewed by a Cassandra committer. That *is* our bar for merging into
> Cassandra trunk.
>
> On 20 Jan 2023, at 12:31, Mick Semb Wever  wrote:
>
>
>> These tickets have all met the standard integration requirements, so I’m
>> just unclear what “higher pre-commit gateway” you are referring to.
>>
>
>
> A merge into trunk deserves extra eyeballs than a merge into a feature
> branch.
>
> We can refer to this as a "higher pre-commit gateway" or a "second pass".
> Either way I believe it is a good thing.
>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Merging CEP-15 to trunk

2023-01-24 Thread Henrik Ingo
On Tue, Jan 24, 2023 at 1:11 AM Jeff Jirsa  wrote:

>  But it's not merge-than-review, because they've already been
> reviewed, before being merged to the feature branch, by committers
> (actually PMC members)?
>
>
There's no question that the feature branch already meets the minimum
requirement for review. If we look at
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance
that would be point #3 in the process for code contributions.

On the other hand it seems clear that there is still active discussion
about the code (including at least one change request: better commented
code) and at least one committer has requested reasonable time to review.
These would be points #4 and #5 in the governance process, so clearly we
are in the state that code "must not be committed".

Presumably it makes sense to review only after the code has been better
documented, so the reasonable time might not have started yet? This is just
my understanding of the email discussion, I'm not participating in the
review myself.

I can also attest that at least for Jacek and Mick, as well as Ekaterina if
she participates, reviewing Accord will be the top priority of their day
job, so I believe the "reasonable time" criteria is clearly met in this
case.


If we step back from the process a bit, the above could also be summarized
as: It's not reasonable to expect that every committer would or should
track work happening in every feature branch. It's only natural that there
will need to be time for review at the point of merging to trunk.


PS: Personally I don't really believe in commit-then-merge. If the author
of the patch is committed to respond to review comments with high priority,
it shouldn't make a difference to them whether the code is committed before
or after the review. And of course if they aren't committed to work with
the reviewer with high priority, then what's the point of reviewing at all?
Since a reviewer is already obligated to do their part in a reasonable
time, it follows that merging after the review is the process where
incentives are aligned, AND there's no downside to the patch author.


You want code that's been written by one PMC member and reviewed by 2 other
> PMC members to be put up for review by some random 4th party? For how long?
>
>
Why is there a difference whether the reviewer is a committer or a PMC
member?

henrik

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Merging CEP-15 to trunk

2023-01-24 Thread Henrik Ingo
When was the last time the feature branch was rebased? Assuming it's a
while back and the delta is significant, surely it's normal process to
first rebase, run tests, and then present the branch for review?

To answer your question: The effect of the rebase is then either baked into
the original commits (which I personally dislike), or you can also have the
rebase-induced changes as their own commits. (Which can get tedious, but
has the benefit of making explicit what was only a change due to rebasing.)
Depending on which approach you take when rebasing, a reviewer would then
review accordingly.

henrik

On Tue, Jan 24, 2023 at 11:14 AM Benedict  wrote:

> No, that is not the normal process. What is it you think you would be
> reviewing? There are no diffs produced as part of rebasing, and the purpose
> of review is to ensure code meets out standards, not that the committer is
> competent at rebasing or squashing. Nor are you familiar with the code as
> it was originally reviewed, so would have nothing to compare against. We
> expect a clean CI run, ordinarily, not an additional round of review. If we
> were to expect that, it would be by the original reviewer, not a third
> party, as they are the only ones able to judge the rebase efficiently.
>
> I would support investigating tooling to support reviewing rebases. I’m
> sure such tools and processes exist. But, we don’t have them today and it
> is not a normal part of the review process. If you want to modify, clarify
> or otherwise stipulate new standards or processes, I suggest a separate
> thread.
>
> > How will the existing tickets make it clear when and where their final
> merge happened?
>
> By setting the release version and source control fields.
>
>
>
> On 24 Jan 2023, at 08:43, Mick Semb Wever  wrote:
>
> 
>
>  But it's not merge-than-review, because they've already been
>> reviewed, before being merged to the feature branch, by committers
>> (actually PMC members)?
>>
>> You want code that's been written by one PMC member and reviewed by 2
>> other PMC members to be put up for review by some random 4th party? For how
>> long?
>>
>
>
> It is my hope that the work as-is is not being merged. That there is a
> rebase and some trivial squashing to do. That deserves a quick check by
> another. Ideally this would be one of the existing reviewers (but like any
> other review step, no matter how short and trivial it is, that's still an
> open process). I see others already doing this when rebasing larger patches
> before the final merge.
>
> Will the branch be rebased and cleaned up?
> How will the existing tickets make it clear when and where their final
> merge happened?
>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Merging CEP-15 to trunk

2023-01-24 Thread Henrik Ingo
 have a different bar for review on CEP feature branches
> (3 committers? 1+ pmc members? more diversity in reviewers or committers as
> measured by some as yet unspoken metric), perhaps we could have that
> discussion. FWIW I'm against changes there as well; we all wear our Apache
> Hats here, and if the debate is between work like this happening in a
> feature branch affording contributors increased efficiency and locality vs.
> all that happening on trunk and repeatedly colliding with everyone
> everywhere, feature branches are a clear win IMO.
>
> And for 3 - I think we've all broadly agreed we shouldn't ninja commit
> unless it's a comment fix, typo, forgotten git add, or something along
> those lines. For any commit that doesn't qualify it should go through the
> review process.
>
> And a final note - Ekaterina alluded to something valuable in her email
> earlier in this thread. I think having a "confirm green on all the test
> suites that are green on merge target" bar for large feature branches
> (rather than strictly the "pre-commit subset") before merge makes a lot of
> sense.
>
> On Tue, Jan 24, 2023, at 1:41 PM, Caleb Rackliffe wrote:
>
> Just FYI, I'm going to be posting a Jira (which will have some
> dependencies as well) to track this merge, hopefully some time today...
>
> On Tue, Jan 24, 2023 at 12:26 PM Ekaterina Dimitrova <
> e.dimitr...@gmail.com> wrote:
>
> I actually see people all the time making a final check before merge as
> part of the review. And I personally see it only as a benefit when it comes
> to serious things like Accord, as an example. Even as a help for the author
> who is overwhelmed with the big amount of work already done - someone to do
> quick last round of review. Team work after all.
>
> Easy rebase - those are great news. I guess any merge conflicts that were
> solved will be documented and confirmed with reviewers before merge on the
> ticket where the final CI push will be posted. I also assumed that even
> without direct conflicts a check that there is no contradiction with any
> post-September commits is done as part of the rebase. (Just adding here for
> completeness)
>
> One thing that I wanted to ask for is when you push to CI, you or whoever
> does it, to approve all jobs. Currently we have pre-approved the minimum
> required jobs in the pre-commit workflow. I think in this case with a big
> work approving all jobs in CircleCI will be of benefit. (I also do it for
> bigger bodies of work to be on the safe side) Just pointing in case you use
> a script or something to push only the pre-approved ones. Please ping me in
> Slack if It’s not clear what I mean, happy to help with that
>
> On Tue, 24 Jan 2023 at 11:52, Benedict  wrote:
>
>
> Perhaps the disconnect is that folk assume a rebase will be difficult and
> have many conflicts?
>
> We have introduced primarily new code with minimal integration points, so
> I decided to test this. I managed to rebase locally in around five minutes;
> mostly imports. This is less work than for a rebase of fairly typical
> ticket of average complexity.
>
> Green CI is of course a requirement. There is, however, no good procedural
> or technical justification for a special review of the rebase.
>
> Mick is encouraged to take a look at the code before and after rebase, and
> will be afforded plenty of time to do so. But I will not gate merge on this
> adhoc requirement.
>
>
>
>
> On 24 Jan 2023, at 15:40, Ekaterina Dimitrova 
> wrote:
>
> 
>
> Hi everyone,
> I am excited to see this work merged. I noticed the branch is 395 commits
> behind trunk or not rebased since September last year. I think if Mick or
> anyone else wants to make a final pass after rebase happens and CI runs -
> this work can only benefit of that. Squash, rebase and full CI run green is
> the minimum that, if I read correctly the thread, we all agree on that
> part.
> I would say that CI and final check after a long rebase of a patch is a
> thing we actually do all the time even for small patches when we get back
> to our backlog of old patches. This doesn’t mean that the previous reviews
> are dismissed or people not trusted or anything like that.
> But considering the size and the importance of this work, I can really see
> only benefit of a final cross-check.
> As Henrik mentioned me, I am not sure I will have the chance to review
> this work any time soon (just setting the right expectations up front) but
> I see at least Mick already mentioning he would do it if there are no other
> volunteers. Now, whether it will be separate ticket or not, that is a
> different story. Aren’t the Accord tickets in an epic under which 

Re: Merging CEP-15 to trunk

2023-01-25 Thread Henrik Ingo
Thanks Benedict

For brevity I'll respond to your email, although indirectly this is also a
continuation of my debate with Josh:

At least on my scorecard, one issue was raised regarding the actual code:
CASSANDRA-18193 Provide design and API documentation. Since the addition of
code comments also significantly impacts the ability of an outsider to
understand and review the code, I would then treat it as an unknown to say
how much else such a fresh review would uncover.

By the way I would say the discussion about git submodules (and all the
other alternatives) in a broad sense was also a review'ish comment.

That said, yes of course the expectation is that if the code has already
been reviewed, and by rather experienced Cassandra developers too, there
probably won't be many issues found, and there isn't a need for several
weeks of line by line re-review.

As for the rebase, I think that somehow started dominating this discussion,
but in my view was never the only reason. For me this is primarily to
satisfy points 4 and 5 in the project governance, that everyone has had an
opportunity to review the code, for whatever reason they may wish to do so.

I should say for those of us on the sidelines we certainly expected a
rebase catching up 6 months and ~500 commits to have more substantial
changes. Hearing that this is not the case is encouraging, as it also
suggests the changes to Cassandra code are less invasive than maybe I and
others had imagined.

henrik

On Wed, Jan 25, 2023 at 1:51 PM Benedict  wrote:

> contributors who didn't actively work on Accord, have assumed that they
> will be invited to review now
>
>
> I may have missed it, but I have not seen anyone propose to substantively
> review the actual *work*, only the impact of rebasing. Which, honestly,
> there is plenty of time to do - the impact is essentially nil, and we
> aren’t planning to merge immediately. I will only not agree to an adhoc
> procedural change to prevent merge until this happens, as a matter of
> principle.
>
> However, I am very sympathetic to a desire to participate *substantively*.
> I think interested parties should have invested as the work progressed, but
> I *don’t* think it is unreasonable to ask for a *some* time prior to
> merge if this hasn’t happened.
>
> So, if you can adequately resource it, we can delay merging a while
> longer. I *want* your (constructive) participation. But, I have not seen
> anything to suggest this is even proposed, let alone realistic.
>
> There are currently five full time contributors participating in the
> Accord project, with cumulatively several person-years of work already
> accumulated. By the time even another month has passed, you will have
> another five person-months of work to catch-up on. Resourcing even a review
> effort to catch up with this is *non-trivial*, and for it to be a
> reasonable ask, you must credibly be able to keep up while making useful
> contributions.
>
> After all, if it had been ready to merge to trunk already a year ago, why
> wasn't it?
>
>
> The Cassandra integration has only existed since late last year, and was
> not merged earlier to avoid interfering with the effort to release 4.1.
>
> One thing that I wanted to ask for is when you push to CI, you or whoever
> does it, to approve all jobs.
>
>
> Thanks Ekaterina, we will be sure to fully qualify the CI result, and I
> will make sure we also run your flaky test runner on the newly introduced
> tests.
>
>
>
>
> On 24 Jan 2023, at 21:42, Henrik Ingo  wrote:
>
> 
> Thanks Josh
>
> Since you mentioned the CEP process, I should also mention one sentiment
> you omitted, but worth stating explicitly:
>
> 4. The CEP itself should not be renegotiated at this point. However, the
> reviewers should rather focus on validating that the implementation matches
> the CEP. (Or if not, that the deviation is of a good reason and the
> reviewer agrees to approve it.)
>
>
> Although I'm not personally full time working on either producing
> Cassandra code or reviewing it, I'm going to spend one more email defending
> your point #1, because I think your proposal would lead to a lot of
> inefficiencies in the project, and that does happen to be my job to care
> about:
>
>  - Even if you could be right, from some point of view, it's nevertheless
> the case that those contributors who didn't actively work on Accord, have
> assumed that they will be invited to review now, when the code is about to
> land in trunk. Not allowing that to happen would make them feel like they
> weren't given the opportunity and that the process in Cassandra Project
> Governance was bypassed. We can agree to work differently in the future,
> but this is the reality now.
>
&

Re: Merging CEP-15 to trunk

2023-01-25 Thread Henrik Ingo
 obviously be a
> waste of precious talent.
>
> This is an excellent point. The only mitigation I'd see for this would be
> an additional review period or burden collectively before merge of a
> feature branch into trunk once something has crossed a threshold of success
> as to be included, or stepping away from a project where you don't have the
> cycles to stay up to date and review and trust that the other committers
> working on the project are making choices that are palatable and acceptable
> to you.
>
> If all API decisions hit the dev ML and the architecture conforms
> generally to the specification of the CEP, it seems to me that stepping
> back and trusting your fellow committers to Do The Right Thing is the
> optimal (and scalable) approach here?
>
> Let's say someone in October 2021 was invested in the quality of Cassandra
> 4.1 release. Should this person now invest in reviewing Accord or not? It's
> impossible to know. Again, in hindsight we know that the answer is no, but
> your suggestion again would require the person to review all active feature
> branches just in case.
>
> I'd argue that there's 3 times to really invest in the quality of any
> Cassandra release:
> 1. When we set agreed upon bars for quality we'll all hold ourselves
> accountable to (CI, code style, test coverage, etc)
> 2. When we raise new committers
> 3. When we write or review code
>
> I don't think it's sustainable to try and build processes that will
> bottleneck our throughput as a community to the finite availability of
> individuals if they're concerned about specific topics and want to be
> individually involved in specific code level changes. If folks feel like
> our current processes, CI infrastructure, and committer pool risks
> maintaining our bar of quality that's definitely something we should talk
> about in depth, as in my mind that's the backbone of us scaling stably as a
> project community.
>
>
> On Tue, Jan 24, 2023, at 4:41 PM, Henrik Ingo wrote:
>
> Thanks Josh
>
> Since you mentioned the CEP process, I should also mention one sentiment
> you omitted, but worth stating explicitly:
>
> 4. The CEP itself should not be renegotiated at this point. However, the
> reviewers should rather focus on validating that the implementation matches
> the CEP. (Or if not, that the deviation is of a good reason and the
> reviewer agrees to approve it.)
>
>
> Although I'm not personally full time working on either producing
> Cassandra code or reviewing it, I'm going to spend one more email defending
> your point #1, because I think your proposal would lead to a lot of
> inefficiencies in the project, and that does happen to be my job to care
> about:
>
>  - Even if you could be right, from some point of view, it's nevertheless
> the case that those contributors who didn't actively work on Accord, have
> assumed that they will be invited to review now, when the code is about to
> land in trunk. Not allowing that to happen would make them feel like they
> weren't given the opportunity and that the process in Cassandra Project
> Governance was bypassed. We can agree to work differently in the future,
> but this is the reality now.
>
>  - Although those who have collaborated on Accord testify that the code is
> of the highest quality and ready to be merged to trunk, I don't think that
> can be expected of every feature branch all the time. In fact, I'm pretty
> sure the opposite must have been the case also for the Accord branch at
> some point. After all, if it had been ready to merge to trunk already a
> year ago, why wasn't it? It's kind of the point of using a feature branch
> that the code in it is NOT ready to be merged yet. (For example, the
> existing code might be of high quality, but the work is incomplete, so it
> shouldn't be merged to trunk.)
>
>  - Uncertainty: It's completely ok that some feature branches may be
> abandoned without ever merging to trunk. Requiring the community (anyone
> potentially interested, anyways) to review such code would obviously be a
> waste of precious talent.
>
>  - Time uncertainty: Also - and this is also true for Accord - it is
> unknown when the merge will happen if it does. In the case of Accord it is
> now over a year since the CEP was adopted. If I remember correctly an
> initial target date for some kind of milestone may have been Summer of
> 2022? Let's say someone in October 2021 was invested in the quality of
> Cassandra 4.1 release. Should this person now invest in reviewing Accord or
> not? It's impossible to know. Again, in hindsight we know that the answer
> is no, but your suggesti

Re: Merging CEP-15 to trunk

2023-01-27 Thread Henrik Ingo
While the substance of the review discussion has moved to Jira, I wanted to
come back here to clarify one thing:

I've learned that when I have defended the need (or right, if appealing to
the Governance texts...) for contributors to be able to review a feature
branch at the time it is merged to trunk - which for Accord is now - that a
common reaction to this is that doing a review of Accord now might take
months and would stall the Accord project for months if that is allowed.

So I just wanted to clarify I don't think that sounds "reasonable", as the
word is used in the Governance wiki page. I agree that to engage in such
level of review, it would have needed to happen earlier. On the  other
hand, I can think of many things that a pair of fresh eyes can do at this
point in reasonable time, like days or a couple weeks.

I spent 6 hours this week glancing over the 28k lines of code that would be
added to C* codebase. I was able to form an opinion of the patch, have some
fruitful off-list conversations with several people, and as a by-product
apparently also caught some commented out code that possibly should be
enabled before the merge.

henrik

On Wed, Jan 25, 2023 at 5:06 PM Henrik Ingo 
wrote:

> Thanks Benedict
>
> For brevity I'll respond to your email, although indirectly this is also a
> continuation of my debate with Josh:
>
> At least on my scorecard, one issue was raised regarding the actual code:
> CASSANDRA-18193 Provide design and API documentation. Since the addition of
> code comments also significantly impacts the ability of an outsider to
> understand and review the code, I would then treat it as an unknown to say
> how much else such a fresh review would uncover.
>
> By the way I would say the discussion about git submodules (and all the
> other alternatives) in a broad sense was also a review'ish comment.
>
> That said, yes of course the expectation is that if the code has already
> been reviewed, and by rather experienced Cassandra developers too, there
> probably won't be many issues found, and there isn't a need for several
> weeks of line by line re-review.
>
> As for the rebase, I think that somehow started dominating this
> discussion, but in my view was never the only reason. For me this is
> primarily to satisfy points 4 and 5 in the project governance, that
> everyone has had an opportunity to review the code, for whatever reason
> they may wish to do so.
>
> I should say for those of us on the sidelines we certainly expected a
> rebase catching up 6 months and ~500 commits to have more substantial
> changes. Hearing that this is not the case is encouraging, as it also
> suggests the changes to Cassandra code are less invasive than maybe I and
> others had imagined.
>
> henrik
>
> On Wed, Jan 25, 2023 at 1:51 PM Benedict  wrote:
>
>> contributors who didn't actively work on Accord, have assumed that they
>> will be invited to review now
>>
>>
>> I may have missed it, but I have not seen anyone propose to substantively
>> review the actual *work*, only the impact of rebasing. Which, honestly,
>> there is plenty of time to do - the impact is essentially nil, and we
>> aren’t planning to merge immediately. I will only not agree to an adhoc
>> procedural change to prevent merge until this happens, as a matter of
>> principle.
>>
>> However, I am very sympathetic to a desire to participate *substantively*.
>> I think interested parties should have invested as the work progressed, but
>> I *don’t* think it is unreasonable to ask for a *some* time prior to
>> merge if this hasn’t happened.
>>
>> So, if you can adequately resource it, we can delay merging a while
>> longer. I *want* your (constructive) participation. But, I have not seen
>> anything to suggest this is even proposed, let alone realistic.
>>
>> There are currently five full time contributors participating in the
>> Accord project, with cumulatively several person-years of work already
>> accumulated. By the time even another month has passed, you will have
>> another five person-months of work to catch-up on. Resourcing even a review
>> effort to catch up with this is *non-trivial*, and for it to be a
>> reasonable ask, you must credibly be able to keep up while making useful
>> contributions.
>>
>> After all, if it had been ready to merge to trunk already a year ago, why
>> wasn't it?
>>
>>
>> The Cassandra integration has only existed since late last year, and was
>> not merged earlier to avoid interfering with the effort to release 4.1.
>>
>> One thing that I wanted to ask for is when you push to CI, you or whoever
>> does it, to approve all jobs

Re: Merging CEP-15 to trunk

2023-01-30 Thread Henrik Ingo
bility to the
claim the process was as rigorous as it is for trunk, and looking at the
build history for a few minutes should put our minds at ease. I can't see
anything Accord related in Jenkins or Butler? But perhaps there's a
CircleCI dashboard somewhere?

>From all the discussions I've had the past week, it seems the emerging
consensus is that protecting the stability of the CI pipeline is that top
concern. Other topics discussed (like comments in the code) are smaller
issues relatively.

If the feature branch indeed has all the CI machinery setup as if it was
trunk, then I agree chances are good it will merge into trunk smoothly. If
the CI coverage isn't 100%, then we should just identify the gaps, and
focus on that while preparing to merge. Either way, I know everyone is at
this point committed to CI stability, including the Accord authors, so I'm
not particularly worried personally.


>1. Code must not be committed if a committer has requested reasonable
>time to conduct a review
>
> I'm realizing in retrospect this leaves ambiguity around *where* the code
> is committed (i.e. trunk vs. feature branch). On the one hand we could say
> this code has already been reviewed and committed; the feature branch had
> the same bar as trunk. On the other hand you could say the intent of this
> agreement was to allow committers to inspect code *before it goes to
> trunk*, which would present a different set of constraints.
>
>
Without further qualification, I would have expected the latter, but I
realize here are good examples to support the opposite interpretation too.

I actually don't know that we need to rush to clarify that either. This is
the first time the situation came up, and after this discussion I'm sure
things will be clearer.



> My perspective is that of the former; I consider this code "already
> committed". This work was done on a formal feature branch w/JIRA tickets
> and feedback / review done on the PR's in the open on the feature branch by
> 2+ committers, so folks have had a reasonable time to engage with the
> process and conduct a review. I don't think right on the cusp of a
> ceremonial / procedural cutover from a feature branch to trunk is the right
> time, nor scope of work, to propose a blocking review based on this text.
>
>
Call me conservative but I still don't  think a dump / merge / addition of
28k + 49k lines of code (over 10% of the codebase!) is ceremonial /
procedural at all. The big CEPs I've been involved with in these past two
years (SAI, Tries, UCS...) are about 10-20k each I think, and I thought
those are big, and many of us worry about the impact to CI for those as
well. (For example, if this branch is mostly tested in CircleCi, and not
heavily on the ASF Jenkins, it could fail even for trivial reasons, like
running out of disk or RAM...) I hope we see this merged to trunk very
smoothly, but that's smoothly like a brigg, not smoothly as a laser class
boat.



henrik
PS: My current status is that I've learned a lot about accord the past
week, even stumbled into a block of code that presumably should be
uncommented (now or later)... But the discussion on remaining items should
have been delegated to the more relevant contributors than myself. If you
don't see further communications from me, it just means I'm satisfied for
my part and expect others to speak up (or not) for those parts. (I guess I
will quietly continue to keep an eye on the CI aspect...)



-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Merging CEP-15 to trunk

2023-01-30 Thread Henrik Ingo
Ooops, I missed copy pasting this reply into my previous email:

On Fri, Jan 27, 2023 at 11:21 PM Benedict  wrote:

> I'm realizing in retrospect this leaves ambiguity
>
>
> Another misreading at least of the *intent* of these clauses, is that
> they were to ensure that concerns about a *design and approach* are
> listened to, and addressed to the satisfaction of interested parties. It
> was essentially codifying the project’s long term etiquette around pieces
> of work with either competing proposals or fundamental concerns. It has
> historically helped to avoid escalation to vetoes, or reverting code after
> commit.
>
> It wasn’t intended that *any* reason might be invoked, as seems to have
> been inferred, and perhaps this should be clarified, though I had hoped it
> would be captured by the word “reasonable". Minor concerns that haven’t
> been caught by the initial review process can always be addressed in
> follow-up work, as is very common.
>
>
Wouldn't you expect such concerns to at least partially now have been
covered in  the CEP discussion, up front? I would expect at most at this
stage someone could validate that the implementation follows the CEP. But I
wouldn't expect a debate on competing approaches at this stage.

henrik
-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Merging CEP-15 to trunk

2023-01-30 Thread Henrik Ingo
On Tue, Jan 31, 2023 at 1:28 AM David Capwell  wrote:

> If the CI coverage isn't 100%, then we should just identify the gaps, and
> focus on that while preparing to merge
>
>
> It has 100% coverage that is done normally for trunk merges; which is a
> pre-commit CI run in Circle OR Jenkins.
>
>
Sure. And thanks.

But during development, did you ever run nightly tests / all tests? I
wouldn't want the night after merging to trunk to be the first time those
are run.

henrik
-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-02 Thread Henrik Ingo
Congratulations Patrick.

I guess this is a good time to thank you for your mentorship as I have
learned about and worked with the Cassandra community for 2½ years now.

henrik

On Thu, Feb 2, 2023 at 7:58 PM Benjamin Lerer  wrote:

> The PMC members are pleased to announce that Patrick McFadin has accepted
> the invitation to become committer today.
>
> Thanks a lot, Patrick, for everything you have done for this project and
> its community through the years.
>
> Congratulations and welcome!
>
> The Apache Cassandra PMC members
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: CASSANDRA-14227 removing the 2038 limit

2023-02-03 Thread Henrik Ingo
 represent these times as deltas from the nowInSec
> being used to process the query. So, long math would only be used to
> normalise the times to this nowInSec (from whatever is stored in the
> sstable) within a method, and ints would be stored in memtables and any
> objects used for processing.
>
> This might admittedly be more work, but I don’t believe it should be too
> challenging - we can introduce a method deletionTime(int nowInSec) that
> returns a long value by adding nowInSec to the deletionTime, and make the
> underlying value private, refactoring call sites?
>
> On 29 Sep 2022, at 09:37, Berenguer Blasi 
>  wrote:
>
> Hi all,
>
> I have taken a stab in a PR you can find attached in the ticket. Mainly:
>
> - I have moved deletion times, gc and nowInSec timestamps to long. That
> should get us past the 2038 limit.
>
> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a
> 'free' guardrail.
>
> - A new NONE overflow policy is the default but everything is backwards
> compatible by keeping the previous ones in place. Think upgrade scenarios
> or apps relying on the previous behavior.
>
> - The new limit is around year 292,471,208,677 which sounds ok given the
> Sun will start collapsing in 3 to 5 billion years :-)
>
> - Please feel free to drop by the ticket and take a look at the PR even if
> it's cursory
>
> Thx in advance.
>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] Merging incremental feature work

2023-02-03 Thread Henrik Ingo
ies was even explicitly split
into separate CEPs for the API refactor and the new functionality.


Perhaps Linus Torvalds said the above more succintly than me:



*So the name of the game is to _avoid_ decisions, at least the big and
painful ones. Making small and non-consequential decisions is fine, and
makes you look like you know what you're doing, so what a kernel manager
needs to do is to turn the big and painful ones into small things where
nobody really cares.It helps to realize that the key difference between a
big decision and a small one is whether you can fix your decision
afterwards. Any decision can be made small by just always making sure that
if you were wrong (and you _will_ be wrong), you can always undo the damage
later by backtracking. Suddenly, you get to be doubly managerial for making
_two_ inconsequential decisions - the wrong one _and_ the right one.*


https://www.openlife.cc/onlinebook/epilogue-linux-kernel-management-style-linus-torvalds


(I particularly like the last sentence!)

henrik


On Fri, Feb 3, 2023 at 2:06 PM Josh McKenzie  wrote:

> The topic of how we handle merging large complex bodies of work came up
> recently with the CEP-15 merge and JDK17, and we've faced this question in
> the past as well (CASSANDRA-8099 comes to mind).
>
> The times we've done large bodies of work separately from trunk and then
> merged them in have their own benefits and costs, and the examples I can
> think of where we've merged in work to trunk incrementally with something
> flagged experimental have markedly different cost/benefits. Further, the
> two approaches have shaped the *way* we approached work quite differently
> with how we architected and tested things.
>
> My current thinking: I'd like to propose we all agree to move to merge
> work into trunk incrementally if it's either:
> 1) New JDK support
> 2) An approved CEP
>
> The bar for merging anything into trunk should remain:
> 1) 2 +1's from committers
> 2) Green CI (presently circle or ASF, in the future ideally ASF or an ASF
> analog env)
>
> I don't know if this is a generally held opinion and we just haven't
> discussed it and switched our general behavior yet, or if this is more
> controversial, so I won't burden this email with enumerating pros and cons
> of the two approaches until I get a gauge of the community's temperature.
>
> So - what do we think?
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: CASSANDRA-14227 removing the 2038 limit

2023-02-03 Thread Henrik Ingo
In that case I agree that increasing from 20 years is an interesting
opportunity but clearly out of scope for your current ticket.

On Fri, Feb 3, 2023 at 3:48 PM Berenguer Blasi 
wrote:

> Hi,
>
> 20y is the current and historic value. 68y is what an integer can
> accommodate hence the current 2038 limit since the 1970 Unix epoch. I
> wouldn't make it a configurable value, off the top of my head it would make
> for some interesting bugs and debugging sessions when nodes had different
> values. Food for another ticket in any case imo.
>
> Regards
> On 3/2/23 14:18, Henrik Ingo wrote:
>
> Naive PHB questions to follow...
>
> Why are 68y and 20y special? Could you pick any value? Could we allow it
> to be configurable? (Last one probably overkill, just asking to
> understand...)
>
> If we can pick any values we want, instinctively I would personally
> suggest to have TTL higher than 20 years, but also kicking the can further
> than 2035, which is only 13 years from now. Just to suggest a specific
> number, why not 35y and 2071?
>
> henrik
>
> On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi 
> wrote:
>
>> Hi All,
>>
>> a version using Uints, 20y max TTL and kicking the can down the road
>> until 2086 has been put up for review #justfyi
>>
>> Regards
>> On 15/11/22 7:06, Berenguer Blasi wrote:
>>
>> Hi all,
>>
>> thanks for your answers!.
>>
>> To Benedict's point: In terms of the uvint enconding of deletionTime i.e.
>> it is true it happens here
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
>> But we also have a DeletionTime serializer here
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
>> that is writing an int and a long that would now write 2 longs.
>>
>> TTL itself (the delta) remains an int in the new PR so it should have no
>> effect in size.
>>
>> Did I reference the correct parts of the codebase? No sstable expert here.
>> On 14/11/22 19:28, Josh McKenzie wrote:
>>
>> in 2035 we'd hit the same problem again.
>>
>> In terms of "kicking a can down the road", this would be a pretty
>> vigorous kick. I wouldn't push back against this deferral. :)
>>
>> On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
>>
>>
>> I’m confused why we see *any* increase in sstable size - TTLs and
>> deletion times are already written as unsigned vints as offsets from an
>> sstable epoch for each value.
>>
>> I would dig in more carefully to explore why you’re seeing this increase?
>> For the same data there should be no change to size on disk.
>>
>>
>> On 14 Nov 2022, at 06:36, C. Scott Andreas 
>>  wrote:
>>
>> A 2-3% increase in storage volume is roughly equivalent to giving up the
>> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression
>> levels. This regression could be very expensive for storage-bound use cases.
>>
>> From the perspective of storage overhead, the unsigned int approach
>> sounds preferable.
>>
>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi 
>>  wrote:
>>
>> 
>>
>> Hi all,
>>
>> We have done some more research on c14227. The current patch for
>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead
>> of int. This approach does not have a negative impact on memtable memory
>> usage, as C* controles the memory used by the Memtable, but based on our
>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2
>> to 3%.
>>
>> As a mitigation to this problem it is possible to encode
>> *localDeletionTime* as a vint. It results in a 1% improvement but might
>> cause additional computations during compaction or some other operations.
>>
>> Benedict's proposal to keep on using ints for TTL but as a delta to
>> nowInSecond would work for memtables but not for work in the SSTable where
>> nowInSecond does not exist. By consequence we would still suffer from the
>> impact on byte flushed and bytes on disk.
>>
>> Another approach that was suggested is the use of unsigned integer. Java
>> 8 has an unsigned integer API that would allow us to use unsigned int for
>> TTLs. Based on computation unsigned ints would give us a maximum time of
>> 136 years since the Unix Epoch and therefore a maximum expiration timestamp
>> in 2106. We would have to keep TTL at 20y instead of 68y to give us enough
>> breathing room though, otherwise in 2035 we&#

Re: Announcement: Performance testing for Cassandra

2023-02-06 Thread Henrik Ingo
Thanks Marianne, and Matt

Ever since I joined this ecosystem I've wanted to see the day that there
are end-to-end full scale performance tests running nightly directly on
upstream Cassandra. Thank you so much for your work towards that!

Come to think of it, thank you to everyone and anyone who worked on open
sourcing the multiple tools used in running those tests.

<3

henrik

On Mon, Feb 6, 2023 at 6:02 PM Marianne Lyne Manaog <
marianne.man...@ieee.org> wrote:

> Hi everyone,
>
> Matt and I have created a public repository that contains performance
> tests for Cassandra using the open-source Fallout tool that the community
> can benefit from. Fallout is an open-source tool for running large scale
> remote-based distributed correctness, verification and performance tests
> for Apache Cassandra. All components for running the performance tests are
> all open-source and use Google Kubernetes Engine.
>
> At the moment, the repository contains 6 performance tests for lwt. We are
> still working on porting a few more tests into the repository. Everyone is
> welcome to contribute their own performance tests to the repository. Here
> is the link to the fallout-tests repository
> <https://github.com/datastax/fallout-tests>.
>
> One thing to note is that, at the moment, it is not possible to run
> performance tests on trunk but rather on specific versions of Cassandra.
> However, we are currently working on it.
>
> Marianne
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Downgradability

2023-02-22 Thread Henrik Ingo
>>
>>>> My understanding of what has been suggested so far translates to:
>>>> - avoid changes to sstable formats;
>>>> - if there are changes, implement them in a way that is
>>>> backwards-compatible, e.g. by duplicating data, so that a new version is
>>>> presented in a component or portion of a component that legacy nodes will
>>>> not try to read;
>>>> - if the latter is not feasible, make sure the changes are only applied
>>>> if a feature flag has been enabled.
>>>>
>>>> To me this approach introduces several risks:
>>>> - it bloats file and parsing complexity;
>>>> - it discourages improvement (e.g. CASSANDRA-17698 is no longer a LHF
>>>> ticket once this requirement is in place);
>>>> - it needs care to avoid risky solutions to address technical issues
>>>> with the format versioning (e.g. staying on n-versions for 5.0 and needing
>>>> a bump for a 4.1 bugfix might require porting over support for new
>>>> features);
>>>> - it requires separate and uncoordinated solutions to the problem and
>>>> switching mechanisms for each individual change.
>>>>
>>>> An alternative solution is to implement/complete CASSANDRA-8110
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-8110>, which provides
>>>> a method of writing sstables for a target version. During upgrades, a node
>>>> could be set to produce sstables corresponding to the older version, and
>>>> there is a very straightforward way to implement modifications to formats
>>>> like the tickets above to conform to its requirements.
>>>>
>>>> What do people think should be the way forward?
>>>>
>>>> Regards,
>>>> Branimir
>>>>
>>>>
>>>> --
>>> you are the apple of my eye !
>>>
>> --
>> http://twitter.com/tjake
>>
>>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Downgradability

2023-02-22 Thread Henrik Ingo
... ok apparently shift+enter  sends messages now?

I was just saying if at least the file format AND system/tables - anything
written to disk - can be protected with a switch, then it allows for quick
downgrade by shutting down the entire cluster and restarting with the
downgraded binary. It's a start.

To be able to do that live in a distributed system needs to consider much
more: gossip, streaming, drivers, and ultimately all features, because we
don't' want an application developer to use a shiny new thing that a) may
not be available on all nodes, or b) may disappear if the cluster has to be
downgraded later.

henrik

On Thu, Feb 23, 2023 at 1:14 AM Henrik Ingo 
wrote:

> Just this once I'm going to be really brief :-)
>
> Just wanted to share for reference how Mongodb implemented
> downgradeability around their 4.4 version:
> https://www.mongodb.com/docs/manual/release-notes/6.0-downgrade-sharded-cluster/
>
> Jeff you're right. Ultimately this is about more than file formats.
> However, ideally if at least the
>
> On Mon, Feb 20, 2023 at 10:02 PM Jeff Jirsa  wrote:
>
>> I'm not even convinced even 8110 addresses this - just writing sstables
>> in old versions won't help if we ever add things like new types or new
>> types of collections without other control abilities. Claude's other email
>> in another thread a few hours ago talks about some of these surprises -
>> "Specifically during the 3.1 -> 4.0 changes a column broadcast_port was
>> added to system/local.  This means that 3.1 system can not read the table
>> as it has no definition for it.  I tried marking the column for deletion in
>> the metadata and in the serialization header.  The later got past the
>> column not found problem, but I suspect that it just means that data
>> columns after broadcast_port shifted and so incorrectly read." - this is a
>> harder problem to solve than just versioning sstables and network
>> protocols.
>>
>> Stepping back a bit, we have downgrade ability listed as a goal, but it's
>> not (as far as I can tell) universally enforced, nor is it clear at which
>> point we will be able to concretely say "this release can be downgraded to
>> X".   Until we actually define and agree that this is a real goal with a
>> concrete version where downgrade-ability becomes real, it feels like things
>> are somewhat arbitrarily enforced, which is probably very frustrating for
>> people trying to commit work/tickets.
>>
>> - Jeff
>>
>>
>>
>> On Mon, Feb 20, 2023 at 11:48 AM Dinesh Joshi  wrote:
>>
>>> I’m a big fan of maintaining backward compatibility. Downgradability
>>> implies that we could potentially roll back an upgrade at any time. While I
>>> don’t think we need to retain the ability to downgrade in perpetuity it
>>> would be a good objective to maintain strict backward compatibility and
>>> therefore downgradability until a certain point. This would imply
>>> versioning metadata and extending it in such a way that prior version(s)
>>> could continue functioning. This can certainly be expensive to implement
>>> and might bloat on-disk storage. However, we could always offer an option
>>> for the operator to optimize the on-disk structures for the current version
>>> then we can rewrite them in the latest version. This optimizes the storage
>>> and opens up new functionality. This means new features that can work with
>>> old on-disk structures will be available while others that strictly require
>>> new versions of the data structures will be unavailable until the operator
>>> migrates to the new version. This migration IMO should be irreversible.
>>> Beyond this point the operator will lose the ability to downgrade which is
>>> ok.
>>>
>>> Dinesh
>>>
>>> On Feb 20, 2023, at 10:40 AM, Jake Luciani  wrote:
>>>
>>> 
>>> There has been progress on
>>>
>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8928
>>>
>>> Which is similar to what datastax does for DSE. Would this be an
>>> acceptable solution?
>>>
>>> Jake
>>>
>>> On Mon, Feb 20, 2023 at 11:17 AM guo Maxwell 
>>> wrote:
>>>
>>>> It seems “An alternative solution is to implement/complete
>>>> CASSANDRA-8110 <https://issues.apache.org/jira/browse/CASSANDRA-8110>”
>>>> can give us more options if it is finished😉
>>>>
>>>> Branimir Lambov 于2023年2月20日 周一下午11:03写道:
>>>>
>>>>> Hi everyone,
&

Re: Downgradability

2023-02-23 Thread Henrik Ingo
On Thu, Feb 23, 2023 at 11:57 AM Benedict  wrote:

> Can somebody explain to me why this is being fought tooth and nail, when
> the work involved is absolutely minimal?
>
>
I don't know how each individual has been thinking about this, but it seems
to me just looking at all the tasks that at least the introduction of tries
is a major format change anyway - since it's the whole point - and
therefore people working on other tasks may have assumed the format is
changing anyway and therefore something like a switch (is this what is
referred to as the C-8110 solution?) will take care of it for everyone.

I'm not sure there's consensus that such a switch is a sufficient
resolution to this discussion, but if there were such a consensus, the next
question would be whether the patches that are otherwise ready now can
merge, or whether they will all be blocked waiting for the compatibility
solution first. And possibly better testing, etc. Letting them merge would
be justified by the desire to have more frequent and smaller increments of
work merged into trunk... well, I'm not going to repeat everything from
that discussion but the same pro's and con's apply.

henrik
-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: Downgradability

2023-02-23 Thread Henrik Ingo
Right. So I'm speculating everyone else who worked on a patch that breaks
compatibility has been working under the mindset "I'll just put this behind
the same switch". Or something more vague / even less correct, such as
assuming that tries would become the default immediately.

At least in my mind when we talk about the "switch to enable tries" I do
also consider things like "don't break streaming". So I guess whether one
considers "a switch" to exist already or not, might be subjective in this
case, because people have different assumptions on the definition of done
of such a switch.

henrik

On Thu, Feb 23, 2023 at 2:53 PM Benedict  wrote:

> I don’t think there’s anything about a new format that requires a version
> bump, but I could be missing something.
>
> We have to have a switch to enable tries already don’t we? I’m pretty sure
> we haven’t talked about switching the default format?
>
> On 23 Feb 2023, at 12:12, Henrik Ingo  wrote:
>
> 
> On Thu, Feb 23, 2023 at 11:57 AM Benedict  wrote:
>
>> Can somebody explain to me why this is being fought tooth and nail, when
>> the work involved is absolutely minimal?
>>
>>
> I don't know how each individual has been thinking about this, but it
> seems to me just looking at all the tasks that at least the introduction of
> tries is a major format change anyway - since it's the whole point - and
> therefore people working on other tasks may have assumed the format is
> changing anyway and therefore something like a switch (is this what is
> referred to as the C-8110 solution?) will take care of it for everyone.
>
> I'm not sure there's consensus that such a switch is a sufficient
> resolution to this discussion, but if there were such a consensus, the next
> question would be whether the patches that are otherwise ready now can
> merge, or whether they will all be blocked waiting for the compatibility
> solution first. And possibly better testing, etc. Letting them merge would
> be justified by the desire to have more frequent and smaller increments of
> work merged into trunk... well, I'm not going to repeat everything from
> that discussion but the same pro's and con's apply.
>
> henrik
> --
>
> Henrik Ingo
>
> c. +358 40 569 7354
>
> w. www.datastax.com
>
>
> <https://urldefense.com/v3/__https://www.facebook.com/datastax__;!!PbtH5S7Ebw!dOQqeDGZHgRdaV7zT4J-u7QGa4b2HCSNBgF8KrDldGjvy_guOGUws3L2sV2X5y_vzNYF7iZ85aa0n0n_sPsT$>
> <https://twitter.com/datastax>
> <https://urldefense.com/v3/__https://www.linkedin.com/company/datastax/__;!!PbtH5S7Ebw!dOQqeDGZHgRdaV7zT4J-u7QGa4b2HCSNBgF8KrDldGjvy_guOGUws3L2sV2X5y_vzNYF7iZ85aa0n2bcAuFd$>
> <https://github.com/datastax/>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] Next release date

2023-03-01 Thread Henrik Ingo
 that we can't expect to merge
all of this work the last week of April anyway. So from my point of view
just as we have worked hard to get some of these big features in earlier,
it would not be completely wrong to allow some to finish their work in the
days and weeks after the official cutoff date. It seems this is my answer
to Mick's question 2a.


In contrast, I fear that if we postpone the branch date altogether, it will
delay everything and we will just have this same discussion in September
again.


For the remaining questions, I would also be interested to hear answers to
questions #1 and #2.

henrik



On Wed, Mar 1, 2023 at 3:38 PM Mick Semb Wever  wrote:

> My thoughts don't touch on CEPs inflight.
>>
>
>
>
> For the sake of broadening the discussion, additional questions I think
> worthwhile to raise are…
>
> 1. What third parties, or other initiatives, are invested and/or working
> against the May deadline? and what are their views on changing it?
>   1a. If we push branching back to September, how confident are we that
> we'll get to GA before the December Summit?
> 2. What CEPs look like not landing by May that we consider a must-have
> this year?
>   2a. Is it just tail-end commits in those CEPs that won't make it? Can
> these land (with or without a waiver) during the alpha phase?
>   2b. If the final components to specified CEPs are not
> approved/appropriate to land during alpha, would it be better if the
> project commits to a one-off half-year release later in the year?
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-22 Thread Henrik Ingo
Since Accord depends on transactional meta-data... is there really any
alternative than what you propose?

Sure, if there is some subset of Accord that could be merged, while work
continues on a branched that is based on CEP-21 branch, that would be
great. Merging even a prototype of Accord to trunk probably has marketing
value. (Don't laugh, many popular databases have had "atomic transactions,
except if anyone executes DDL simultaneously".)

On Tue, Mar 14, 2023 at 8:39 PM Caleb Rackliffe 
wrote:

> We've already talked a bit
> <https://lists.apache.org/list?dev@cassandra.apache.org:2023-1:Merging%20CEP-15%20to%20trunk>
> about how and when the current Accord feature branch should merge to trunk.
> Earlier today, the cep-21-tcm branch was created
> <https://lists.apache.org/thread/qkwnhgq02cn12jon2h565kh2gpzp9rry> to
> house ongoing work on Transactional Metadata.
>
> Returning to CASSANDRA-18196
> <https://issues.apache.org/jira/browse/CASSANDRA-18196> (merging Accord
> to trunk) after working on some other issues, I want to propose changing
> direction slightly, and make sure this makes sense to everyone else.
>
> 1.) There are a few minor Accord test issues in progress that I'd like to
> wrap up before doing anything, but those shouldn't take long. (See
> CASSANDRA-18302 <https://issues.apache.org/jira/browse/CASSANDRA-18302>
>  and CASSANDRA-18320
> <https://issues.apache.org/jira/browse/CASSANDRA-18320>.)
> 2.) Accord logically depends on Transactional Metadata.
> 3.) The new cep-21-tcm branch is going to have to be rebased to trunk on
> a regular basis.
>
> So...
>
> 4.) I would like to pause merging cep-15-accord to trunk, and instead
> rebase it on cep-21-tcm until such time as the latter merges to trunk, at
> which point cep-15-accord can be rebased to trunk again and then merged
> when ready, nominally taking up the work of CASSANDRA-18196
> <https://issues.apache.org/jira/browse/CASSANDRA-18196> again.
>
> Any objections to this?
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [EXTERNAL] [DISCUSS] Next release date

2023-03-27 Thread Henrik Ingo
Not so fast...

There's certainly value in spending that time stabilizing the already done
features. It's valuable triaging information to say this used to work
before CEP-21 and only broke after it.

That said, having a very long freeze of trunk, or alternatively having a
very long lived 5.0 branch that is waiting for Accord and diverging with a
trunk that is not frozen... are both undesirable options. (A month or two
could IMO be discussed though.) So I agree with the concern from that point
of view, I just don't agree that having one batch of big features in
stabilization period is zero value.


henrik



On Fri, Mar 24, 2023 at 5:23 PM Jeremiah D Jordan 
wrote:

> Given the fundamental change to how cluster operations work coming from
> CEP-21, I’m not sure what freezing early for “extra QA time” really buys
> us?  I wouldn’t trust any multi-node QA done pre commit.
> What “stabilizing” do we expect to be doing during this time?  How much of
> it do we just have to do again after those things merge?  I for one do not
> like to have release branches cut months before their expected release.  It
> just adds extra merge forward and “where should this go”
> questions/overhead.  It could make sense to me to branch branch when CEP-21
> merges and only let in CEP-15 after that.  CEP-15 is mostly “net new stuff”
> and not “changes to existing stuff” from my understanding?  So no QA effort
> wasted if it is done before it merges.
>
> -Jeremiah
>
> On Mar 24, 2023, at 9:38 AM, Josh McKenzie  wrote:
>
> I would like to propose a partial freeze of 5.0 in June
>
> My .02:
> +1 to:
> * partial freeze on an agreed upon date w/agreed upon other things that
> can optionally go in after
> * setting a hard limit on when we ship from that frozen branch regardless
> of whether the features land or not
>
> -1 to:
> * ever feature freezing trunk again. :)
>
> I worry about the labor involved with having very large work like this
> target a frozen branch and then also needing to pull it up to trunk. That
> doesn't sound fun.
>
> If we resurrected the discussion about cutting alpha snapshots from trunk,
> would that change people's perspectives on the weight of this current
> decision? We'd probably also have to re-open pandora's box talking about
> the solidity of our API's on trunk as well if we positioned those alphas as
> being stable enough to start prototyping and/or building future
> applications against.
>
> On Fri, Mar 24, 2023, at 9:59 AM, Brandon Williams wrote:
>
> I am +1 on a 5.0 branch freeze.
>
> Kind Regards,
> Brandon
>
> On Fri, Mar 24, 2023 at 8:54 AM Benjamin Lerer  wrote:
> >>
> >> Would that be a trunk freeze, or freeze of a cassandra-5.0 branch?
> >
> >
> > I was thinking of a cassandra-5.0 branch freeze. So branching 5.0 and
> allowing only CEP-15 and 21 + bug fixes there.
> > Le ven. 24 mars 2023 à 13:55, Paulo Motta  a
> écrit :
> >>
> >> >  I would like to propose a partial freeze of 5.0 in June.
> >>
> >> Would that be a trunk freeze, or freeze of a cassandra-5.0 branch? I
> agree with a branch freeze, but not with trunk freeze.
> >>
> >> I might work on small features after June and would be happy to delay
> releasing these on 5.0+, but delaying merge to trunk until 5.0 is released
> could be disruptive to contributors workflows and I would prefer to avoid
> that if possible.
> >>
> >> On Fri, Mar 24, 2023 at 6:37 AM Mick Semb Wever  wrote:
> >>>
> >>>
> >>>> I would like to propose a partial freeze of 5.0 in June.
> >>>>
> >>>> …
> >>>>
> >>>> This partial freeze will be valid for every new feature except CEP-21
> and CEP-15.
> >>>
> >>>
> >>>
> >>> +1
> >>>
> >>> Thanks for summarising the thread this way Benjamin. This addresses my
> two main concerns: letting the branch/release date slip too much into the
> unknown, squeezing GA QA efforts, while putting in place exceptional
> waivers for CEP-21 and CEP-15.
> >>>
> >>> I hope that in the future we will be more willing to commit to the
> release train model: less concerned about "what the next release contains";
> more comfortable letting big features land where they land. But this is
> opinion and discussion for another day… possibly looping back to the
> discussion on preview releases…
> >>>
> >>>
> >>> Do we have yet from anyone a (rough) eta on CEP-15 (post CEP-21)
> landing in trunk?
> >>>
> >>>
>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-27 Thread Henrik Ingo
t;https://issues.apache.org/jira/browse/CASSANDRA-18330>.
>
> In the meantime, I rebased cep-15-accord on trunk at
> commit 3eb605b4db0fa6b1ab67b85724a9cfbf00aae7de. The option to finish the
> remaining bits of CASSANDRA-18196
> <https://issues.apache.org/jira/browse/CASSANDRA-18196> and merge w/o TCM
> is still available, but it sounds like the question we want to answer is
> whether or not we build a throwaway patch for linearizable epochs in lieu
> of TCM?
>
> FWIW, I'd still rather just integrate w/ TCM ASAP, avoiding integration
> risk while accepting the possible delivery risk.
>
> On Fri, Mar 24, 2023 at 9:32 AM Josh McKenzie 
> wrote:
>
>
> making sure that joining and leaving nodes update some state via Paxos
> instead of via gossip
>
> What kind of a time delivery risk does coupling CEP-15 with CEP-21
> introduce (i.e. unk-unk on CEP-21 leading to delay cascades to CEP-15)?
> Seems like having a table we CAS state for on epochs wouldn't be *too 
> *challenging,
> but I'm not familiar w/the details so I could be completely off here.
>
> Being able to deliver both of these things on their own timetable seems
> like a pretty valuable thing assuming the lift required would be modest.
>
> On Fri, Mar 24, 2023, at 6:15 AM, Benedict wrote:
>
>
> Accord doesn’t have a hard dependency on CEP-21 fwiw, it just needs
> linearizable epochs. This could be achieved with a much more modest patch,
> essentially avoiding almost all of the insertion points of cep-21, just
> making sure that joining and leaving nodes update some state via Paxos
> instead of via gossip, and assign an epoch as part of the update.
>
> It would be preferable to use cep-21 since it introduces this
> functionality, and our intention is to use cep-21 for this. But it isn’t a
> hard dependency.
>
>
> On 22 Mar 2023, at 20:58, Henrik Ingo  wrote:
>
> 
> Since Accord depends on transactional meta-data... is there really any
> alternative than what you propose?
>
> Sure, if there is some subset of Accord that could be merged, while work
> continues on a branched that is based on CEP-21 branch, that would be
> great. Merging even a prototype of Accord to trunk probably has marketing
> value. (Don't laugh, many popular databases have had "atomic transactions,
> except if anyone executes DDL simultaneously".)
>
> On Tue, Mar 14, 2023 at 8:39 PM Caleb Rackliffe 
> wrote:
>
> We've already talked a bit
> <https://lists.apache.org/list?dev@cassandra.apache.org:2023-1:Merging%20CEP-15%20to%20trunk>
>  about how and when the current Accord feature branch should merge to
> trunk. Earlier today, the cep-21-tcm branch was created
> <https://lists.apache.org/thread/qkwnhgq02cn12jon2h565kh2gpzp9rry> to
> house ongoing work on Transactional Metadata.
>
> Returning to CASSANDRA-18196
> <https://issues.apache.org/jira/browse/CASSANDRA-18196> (merging Accord
> to trunk) after working on some other issues, I want to propose changing
> direction slightly, and make sure this makes sense to everyone else.
>
> 1.) There are a few minor Accord test issues in progress that I'd like to
> wrap up before doing anything, but those shouldn't take long. (See
> CASSANDRA-18302 <https://issues.apache.org/jira/browse/CASSANDRA-18302>
>  and CASSANDRA-18320
> <https://issues.apache.org/jira/browse/CASSANDRA-18320>.)
> 2.) Accord logically depends on Transactional Metadata.
> 3.) The new cep-21-tcm branch is going to have to be rebased to trunk on
> a regular basis.
>
> So...
>
> 4.) I would like to pause merging cep-15-accord to trunk, and instead
> rebase it on cep-21-tcm until such time as the latter merges to trunk, at
> which point cep-15-accord can be rebased to trunk again and then merged
> when ready, nominally taking up the work of CASSANDRA-18196
> <https://issues.apache.org/jira/browse/CASSANDRA-18196> again.
>
> Any objections to this?
>
>
>
> --
>
>
> *Henrik Ingo*
>
> *c*. +358 40 569 7354
>
> *w*. *www.datastax.com <http://www.datastax.com/>*
>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-04 Thread Henrik Ingo
;
>>> KEYSPACE at least makes sense in the context that it is the unit that
>>> defines how those partitions keys are going to be treated/replicated
>>>
>>> DATABASE may be ambiguous, but it's ambiguity shared across the
>>> industry.
>>>
>>> Creating a new name like TABLESPACE or TABLEGROUP sounds horrible
>>> because it'll be unique to us in the world, and therefore unintuitive for
>>> new users.
>>>
>>>
>>>
>>> On Tue, Apr 4, 2023 at 9:36 AM Josh McKenzie 
>>> wrote:
>>>
>>>> I think there's competing dynamics here.
>>>>
>>>> 1) KEYSPACE isn't that great of a name; it's not a space in which keys
>>>> are necessarily unique, and you can't address things just by key w/out
>>>> their respective tables
>>>> 2) DATABASE isn't that great of a name either due to the aforementioned
>>>> ambiguity.
>>>>
>>>> Something like "TABLESPACE" or 'TABLEGROUP" would *theoretically*
>>>> better satisfy point 1 and 2 above but subjectively I kind of recoil at
>>>> both equally. So there's that.
>>>>
>>>> On Tue, Apr 4, 2023, at 12:30 PM, Abe Ratnofsky wrote:
>>>>
>>>> I agree with Bowen - I find Keyspace easier to communicate with. There
>>>> are plenty of situations where the use of "database" is ambiguous (like
>>>> "Could you help me connect to database x?"), but Keyspace refers to a
>>>> single thing. I think more software is moving towards calling these things
>>>> "namespaces" (like Kubernetes), and while "Keyspaces" is not a term used in
>>>> this way elsewhere, I still find it leads to clearer communication.
>>>>
>>>> --
>>>> Abe
>>>>
>>>>
>>>> On Apr 4, 2023, at 9:24 AM, Andrés de la Peña 
>>>> wrote:
>>>>
>>>> I think supporting DATABASE is a great idea.
>>>>
>>>> It's better aligned with SQL databases, and can save new users one of
>>>> the first troubles they find.
>>>>
>>>> Probably anyone starting to use Cassandra for the first time is going
>>>> to face the what is a keyspace? question in the first minutes. Saving that
>>>> to users with a more common name would be a victory for usability IMO.
>>>>
>>>> On Tue, 4 Apr 2023 at 16:48, Mike Adamson 
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'd like to propose that we add DATABASE to the CQL grammar as an
>>>> alternative to KEYSPACE.
>>>>
>>>> Background: While TABLE was introduced as an alternative for
>>>> COLUMNFAMILY in the grammar we have kept KEYSPACE for the container name
>>>> for a group of tables. Nearly all traditional SQL databases use DATABASE as
>>>> the container name for a group of tables so it would make sense for
>>>> Cassandra to adopt this naming as well.
>>>>
>>>> KEYSPACE would be kept in the grammar but we would update some logging
>>>> and documentation to encourage use of the new name.
>>>>
>>>> Mike Adamson
>>>>
>>>> --
>>>> [image: DataStax Logo Square] <https://www.datastax.com/>
>>>> *Mike Adamson*
>>>> Engineering
>>>> +1 650 389 6000 <16503896000> | datastax.com
>>>> <https://www.datastax.com/>
>>>> Find DataStax Online:
>>>> [image: LinkedIn Logo]
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>>>[image: Facebook Logo]
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>>>[image: Twitter Logo] <https://twitter.com/DataStax>   [image: RSS
>>>> Feed] <https://www.datastax.com/blog/rss.xml>   [image: Github Logo]
>>>> <https://github.com/datastax>
>>>>
>>>>
>>>>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Henrik Ingo
On Thu, Apr 6, 2023 at 4:16 PM Josh McKenzie  wrote:

> KEYSPACE is fine. If we want to introduce a standard nomenclature like
> DATABASE that’s also fine. Inventing brand new ones is not fine, there’s no
> benefit.
>
> I'm with Benedict in principle, with Aleksey in practice; I think KEYSPACE
> and SCHEMA are actually fine enough.
>
>
Having learned that SCHEMA already exists as a synonym for KEYSPACE, I
think everything is good here. If Cassandra evolves to a richer database
(transactions and queries beyond just key based access) then gradually
adopting SCHEMA as the primary name might feel natural. Once we get there.


> If and when we get to any kind of multi-tenancy, having a more
> metaphorical abstraction that users are familiar with like these becomes
> more valuable; it's pretty clear that things in different keyspaces,
> different databases, or even different schemas could have different access
> rules, resourcing, etc from one another.
>
>
At Datastax I've tried, with some success actually, to ban the use of the
word "Database" in our cloud service, because it was too overloaded.
Various people, one group of which were the UI designers that expose their
point of view to actual users, had completely different ideas of what a
"database" is. I remember at least:
 - the cluster of servers / VMs in the cloud that together contain a
Cassandra database. => It's a cluster.
 - One tenant in a multi-tenanant cluster => It's a tenant
 - A KEYSPACE. This would have been most correct in my world view, but was
actually the least used. => KEYSPACE or SCHEMA
 - The software product: Cassandra, DSE, or Astra

I think the first two were the ones actually used in the UI.

Now that I think about this email thread, the different expectations of
what the word "database" means might correlate with whether the speaker's
background is in the Oracle/Postgresql/Microsoft camp, or the MySQL/MongoDB
camp.


So it's like me trying to order a bisquit in a US cafe.

henrik





> While the off-the-cuff logical TABLEGROUP thing is a *literal* statement
> about what the thing is, it'd be another unique term to us;  we have enough
> things in our system where we've charted our own path. My personal .02 is
> we don't need to go adding more. :)
>
> On Thu, Apr 6, 2023, at 8:54 AM, Mick Semb Wever wrote:
>
>
> … but that should be a different discussion about how we evolve config.
>
>
>
> I disagree. Nomenclature being difficult can benefit from holistic and
> forward thinking.
> Sure you can label this off-topic if you like, but I value our discuss
> threads being collaborative in an open-mode. Sometimes the best idea is on
> the tail end of a sequence of bad and/or unpopular ideas.
>
>
>
>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] CEP-29 CQL NOT Operator

2023-04-12 Thread Henrik Ingo
Wait... Why would anything require ALLOW FILTERING if the partition key is
defined? That seems to contradict documentation:
https://cassandra.apache.org/doc/latest/cassandra/cql/dml.html#allow-filtering

Also my intuition / expectation matches what the manual says.

henrik

On Fri, Apr 7, 2023 at 12:01 AM Jeremy Hanna 
wrote:

> Considering all of the examples require using ALLOW FILTERING with the
> partition key specified, I think it's appropriate to consider separating
> out use of ALLOW FILTERING within a partition versus ALLOW FILTERING across
> the whole table.  A few years back we had a discussion about this in ASF
> slack in the context of capability restrictions and it seems relevant
> here.  That is, we don't want people to get comfortable using ALLOW
> FILTERING across the whole table.  However, there are times when ALLOW
> FILTERING within a partition is reasonable.
>
> Ticket to discuss separating them out:
> https://issues.apache.org/jira/browse/CASSANDRA-15803
> Summary: Perhaps add an optional [WITHIN PARTITION] or something similar
> to make it backwards compatible and indicate that this is purely within the
> specified partition.
>
> This also gives us the ability to disallow table scan types of ALLOW
> FILTERING from a guard rail perspective, because the intent is explicit.
> That operators could disallow ALLOW FILTERING but allow ALLOW FILTERING
> WITHIN PARTITION, or whatever is decided.
>
> I do NOT want to hijack a good discussion but I thought this separation
> could be useful within this context.
>
> Jeremy
>
> On Apr 6, 2023, at 3:00 PM, Patrick McFadin  wrote:
>
> I love that this is finally coming to Cassandra. Absolutely hate that,
> once again, we'll be endorsing the use of ALLOW FILTERING. This is an
> anti-pattern that keeps getting legitimized.
>
> Hot take: Should we just not do Milestones 1 and 2 and wait for an
> index-only Milestone 3?
>
> Patrick
>
> On Thu, Apr 6, 2023 at 10:04 AM David Capwell  wrote:
>
>> Overall I welcome this feature, was trying to use this around 1-2 months
>> back and found we didn’t support, so glad to see it coming!
>>
>> From a testing point of view, I think we would want to have good fuzz
>> testing covering complex types (frozen/non-frozen collections, tuples, udt,
>> etc.), and reverse ordering; both sections tend to cause the most problem
>> for new features (and existing ones)
>>
>> We also will want a way to disable this feature, and optionally disable
>> at different sections (such as m2’s NOT IN for partition keys).
>>
>> > On Apr 4, 2023, at 2:28 AM, Piotr Kołaczkowski 
>> wrote:
>> >
>> > Hi everyone!
>> >
>> > I created a new CEP for adding NOT support to the query language and
>> > want to start discussion around it:
>> >
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator
>> >
>> > Happy to get your feedback.
>> > --
>> > Piotr
>>
>>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] CEP-29 CQL NOT Operator

2023-04-13 Thread Henrik Ingo
On Thu, Apr 13, 2023 at 10:20 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Somebody correct me if I am wrong but "partition key" itself is not enough
> (primary keys = partition keys + clustering columns). It will require ALLOW
> FILTERING when clustering columns are not specified either.
>
> create table ks.tb (p1 int, c1 int, col1 int, col2 int, primary key (p1,
> c1));
> select * from ks.tb where p1 = 1 and col1 = 2; // this will require
> allow filtering
>
> The documentation seems to omit this fact.
>

It does seem so.

That said, personally I was assuming, and would still argue it's the
optimal choice, that the documentation was right and reality is wrong.

If there is a partition key, then the query can avoid scanning the entire
table, across all nodes, potentially petabytes.

If a query specifies a partition key but not the full clustering key, of
course there will be some scanning needed, but this is marginal compared to
the need to scan the entire table. Even in the worst case, a partition with
2 billion cells, we are talking about seconds to filter the result from the
single partition.

> Aha I get what you all mean:

No, I actually think both are unnecessary. But yeah, certainly this latter
case is a bug?

henrik

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond published EOL

2023-04-17 Thread Henrik Ingo
 EOL means is fuzzy. Does the project still fix CVEs, will there be
> infrastructure if someone wants to fix something, etc.  So at a minimum I
> would expect documentation and agreement around those things.
>
> If you look at Ubuntu and Java they distinguish between LTS releases and
> normal releases - but they are also doing this for a long time. The quicker
> release cycle (a new release every year) is sort of new-ish and hasn't been
> digested by all operators and users. So given 3.11 only extra support for a
> limited time to aid the transition like OpenJDK is doing for Java 8 might
> be prudent - Mick raises a valid point that if we go out and say "this is
> the new EOL, but this time we mean it" might encourage people to hope for
> another extension. I have no good answer other than communicate harder and
> more clearly - the status quo lacks clarity which is worse.
>
> The other point Mick raises which releases to support gets to another
> discussion: As of today operators need to upgrade every two years and (also
> jump versions) aka I would need to go 3.11->4.1 right when it came out to
> get the full two year "support". I might feel uncomfortable going to a
> release which has just been released so realistically I need to update in
> between one and two years - give or take. This raises the question if we
> should dedicate some versions as LTS releases meaning they get longer
> support. Five years is common but that is also up for discussion. As an
> added benefit if there are commercial entities wanting to offer paid
> support they could focus on the LTS releases and bundle resources for the
> upstream support.
>
> This is a good discussion and I feel especially the implied CVE support
> needs to be more formalized.
>
> Thanks for indulging me,
> German
>
> --
> *From:* Jacek Lewandowski 
> *Sent:* Thursday, April 13, 2023 11:23 PM
> *To:* dev@cassandra.apache.org 
> *Subject:* Re: [EXTERNAL] Re: (CVE only) support for 3,11 beyond
> published EOL
>
> To me, as this is an open source project, we, the community, do not have
> to do anything, we can, but we are not obliged to, and we usually do that
> because we want to :-)
>
> To me, EOL means that we move focus to newer releases. Not that we are
> forbidden to do anything in the older ones. One formal point though is the
> machinery - as long as we have the machinery to test and release, that's
> all we need. However, in face of coming changes in testing, I suppose some
> extra effort will have to be done to support older versions. Finding people
> who want to help out with that could be a kind of validation whether that
> effort is justified.
>
> btw. We have recently agreed to keep support for M sstables format (3.0 -
> 3.11).
>
> thanks,
> - - -- --- -  -
> Jacek Lewandowski
>
>
> czw., 13 kwi 2023 o 21:59 Mick Semb Wever  napisał(a):
>
> Yes, this would be great. Right now users are confused what EOL means and
> what they can expect.
>
>
>
> I think the project would need to land on an agreed position.  I tried to
> find any reference to my earlier statement around CVEs on the latest
> unmaintained branch but could not find it (I'm sure it was mentioned
> somewhere :(
>
> How many past branches?  All CVEs?  What if CVEs are in dependencies?
> And is this a slippery slope, will such a formalised and documented
> commitment lead to more users on EOL versions? (see below)
> How do other committers feel about this?
>
>
> I am also asking specifically for 3.11 since this release has been around
> so long that it might warrant longer support than what we would offer for
> 4.0.
>
>
>
> This logic can also be the other way around :-)
>
> We should be sending a clear signal that OSS users are expected to perform
> a major upgrade every ~two years.  Vendors can, and are welcome to solve
> this, but the project itself does not support any user's production system,
> it only maintains code branches and performs releases off them, with our
> focus on quality solely on those maintained branches.
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] Next release date

2023-04-17 Thread Henrik Ingo
Trying to collect a few loose ends from across this thread

*> I'm receptive to another definition of "stabilize", *

I think the stabilization period implies more than just CI, which is mostly
a function of unit tests working correctly. For example, at Datastax we
have run a "large scale" test with >100 nodes, over several weeks, both for
4.0 and 4.1. For obvious reasons such tests can't run in nightly CI builds.

Also it is not unusual that during the testing phase developers or
specialized QA engineers can develop new tests (which are possibly added to
CI) to improve coverage for and especially targeting new features in the
release. For example the fixes to Paxos v2 were found by such work before
4.1.

Finally, maybe it's a special case relevant only for  this release, but as
a significant part of the Datastax team has been focused on porting these
large existing features from DSE, and to get them merged before the
original May date, we also have tens of bug fixes waiting to be upstreamed
too. (It used to be an even 100, but I'm unsure what the count is today.)

In fact! If you are worried about how to occupy yourself between a May
"soft freeze" and September'ish hard freeze, you are welcome to chug on
that backlog. The bug fixes are already public and ASL licensed, in the 4.0
based branch here <https://github.com/datastax/cassandra/commits/ds-trunk>.
Failed with an unknown error.

*> 3a. If we allow merge of CEP-15 / CEP-21 after branch, we risk
invalidating stabilization and risk our 2023 GA date*

I think this is the assumption that I personally disagree with. If this is
true, why do we even bother running any CI before the CEP-21 merge? It will
all be invalidated anyway, right?

In my experience, it is beneficial to test as early as possible, and at
different checkpoints during development. If we wouldn't  do it, and we
find some issue in late November, then the window to search for the commit
that introduced the regression is all the way back to the 4.1 GA. If on the
other hand the same test was already rune during the soft freeze, then we
can know that we may focus our search onto CEP-15 and CEP-21.


*> get comfortable with cutting feature previews or snapshot alphas like we
agreed to for earlier access to new stuff*

Snapshots are in fact a valid compromise proposal: A snapshot would provide
a constant version / point in time to focus testing on, but on the other
hand would allow trunk (or the 5.0 branch, in other proposals) to remain
open to new commits. Somewhat "invalidating" the testing work, but
presumably the branch will be relatively calm anyway. Which leads me to 2
important questions:

*WHO would be actively merging things into 5.0 during June-August? *

By my count at that point I expect most contributors to either furiously
work on Acccord and TCM, or work on stabilization (tests, fixes).

Also, if someone did contribute new feature code during this time, they
might find it hard to get priority for reviews, if others are focused on
the above tasks.

Finally, I expect most Europeans to be on vacation 33% of that time.
Non-Europeans may want to try it too!


*WHAT do we expect to get merged during June-August?*

Compared to the tens of thousands of lines of code being merged by Accord,
SAI, UCS and Tries... I imagine even the worst case during a non-freeze in
June-August would be just a tiny percentage of the large CEPs.

In this thread I only see Paulo announcing an intent to commit against
trunk during a soft freeze, and even he agrees with a 5.0 branch freeze.

This last question is basically a form of saying I hope we aren't
discussing a problem that doesn't even exist?

henrik

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] Next release date

2023-04-18 Thread Henrik Ingo
I forgot one last night:

>From Benjamin we have a question that I think went unanswered?

*> Should it not facilitate the work if the branch stops changing heavily?*

This is IMO a good perspective. To me it seems weird to be too hung up on a
"hard limit" on a specific day, when we are talking about merges where a
single merge / rebase takes more than one day. We will have to stop merging
smaller work to trunk anyway, when CEP-21 is being merged. No?

henrik

On Tue, Apr 18, 2023 at 3:24 AM Henrik Ingo 
wrote:

> Trying to collect a few loose ends from across this thread
>
> *> I'm receptive to another definition of "stabilize", *
>
> I think the stabilization period implies more than just CI, which is
> mostly a function of unit tests working correctly. For example, at Datastax
> we have run a "large scale" test with >100 nodes, over several weeks, both
> for 4.0 and 4.1. For obvious reasons such tests can't run in nightly CI
> builds.
>
> Also it is not unusual that during the testing phase developers or
> specialized QA engineers can develop new tests (which are possibly added to
> CI) to improve coverage for and especially targeting new features in the
> release. For example the fixes to Paxos v2 were found by such work before
> 4.1.
>
> Finally, maybe it's a special case relevant only for  this release, but as
> a significant part of the Datastax team has been focused on porting these
> large existing features from DSE, and to get them merged before the
> original May date, we also have tens of bug fixes waiting to be upstreamed
> too. (It used to be an even 100, but I'm unsure what the count is today.)
>
> In fact! If you are worried about how to occupy yourself between a May
> "soft freeze" and September'ish hard freeze, you are welcome to chug on
> that backlog. The bug fixes are already public and ASL licensed, in the 4.0
> based branch here <https://github.com/datastax/cassandra/commits/ds-trunk>
> .
> Failed with an unknown error.
>
> *> 3a. If we allow merge of CEP-15 / CEP-21 after branch, we risk
> invalidating stabilization and risk our 2023 GA date*
>
> I think this is the assumption that I personally disagree with. If this is
> true, why do we even bother running any CI before the CEP-21 merge? It will
> all be invalidated anyway, right?
>
> In my experience, it is beneficial to test as early as possible, and at
> different checkpoints during development. If we wouldn't  do it, and we
> find some issue in late November, then the window to search for the commit
> that introduced the regression is all the way back to the 4.1 GA. If on the
> other hand the same test was already rune during the soft freeze, then we
> can know that we may focus our search onto CEP-15 and CEP-21.
>
>
> *> get comfortable with cutting feature previews or snapshot alphas like
> we agreed to for earlier access to new stuff*
>
> Snapshots are in fact a valid compromise proposal: A snapshot would
> provide a constant version / point in time to focus testing on, but on the
> other hand would allow trunk (or the 5.0 branch, in other proposals) to
> remain open to new commits. Somewhat "invalidating" the testing work, but
> presumably the branch will be relatively calm anyway. Which leads me to 2
> important questions:
>
> *WHO would be actively merging things into 5.0 during June-August? *
>
> By my count at that point I expect most contributors to either furiously
> work on Acccord and TCM, or work on stabilization (tests, fixes).
>
> Also, if someone did contribute new feature code during this time, they
> might find it hard to get priority for reviews, if others are focused on
> the above tasks.
>
> Finally, I expect most Europeans to be on vacation 33% of that time.
> Non-Europeans may want to try it too!
>
>
> *WHAT do we expect to get merged during June-August?*
>
> Compared to the tens of thousands of lines of code being merged by Accord,
> SAI, UCS and Tries... I imagine even the worst case during a non-freeze in
> June-August would be just a tiny percentage of the large CEPs.
>
> In this thread I only see Paulo announcing an intent to commit against
> trunk during a soft freeze, and even he agrees with a 5.0 branch freeze.
>
> This last question is basically a form of saying I hope we aren't
> discussing a problem that doesn't even exist?
>
> henrik
>
> --
>
> Henrik Ingo
>
> c. +358 40 569 7354
>
> w. www.datastax.com
>
> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>
> <https://www.linkedin.com/company/datastax/>
> <https://github.com/datastax/>
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] Next release date

2023-04-19 Thread Henrik Ingo
I'm going to repeat the point from my own thread: rather than thinking of
this as some kind of concession to two exceptional CEPs, could we rather
take the point of view that they get their own space and time precisely
because they are large and invasive and both the merge and testing of them
will benefit from everything else in the branch quieting down?

I'm also not particularly interested in a long feature freeze beyond 1-3
months that would serve the above purpose well.

In short: the proposal should not be that everyone else just have to sit
still and wait for two late stragglers. The proposal is merely to organise
work such that we maximise velocity and quality for merging cep-15&21.
Anything beyond that should be judged differently.

On Tue, 18 Apr 2023, 23:48 J. D. Jordan,  wrote:

> I also don’t really see the value in “freezing with exceptions for two
> giant changes to come after the freeze”.
>
> -Jeremiah
>
> On Apr 18, 2023, at 1:08 PM, Caleb Rackliffe 
> wrote:
>
> 
> > Caleb, you appear to be the only one objecting, and it does not appear
> that you have made any compromises in this thread.
>
> All I'm really objecting to is making special exceptions for particular
> CEPs in relation to our freeze date. In other words, let's not have a
> pseudo-freeze date and a "real" freeze date, when the thing that makes the
> latter supposedly necessary is a very invasive change to the database that
> risks our desired GA date. Also, again, I don't understand how cutting a
> 5.0 branch makes anything substantially easier to start testing. Perhaps
> I'm the only one who thinks this. If so, I'm not going to make further
> noise about it.
>
> On Tue, Apr 18, 2023 at 7:26 AM Henrik Ingo 
> wrote:
>
>> I forgot one last night:
>>
>> From Benjamin we have a question that I think went unanswered?
>>
>> *> Should it not facilitate the work if the branch stops changing
>> heavily?*
>>
>> This is IMO a good perspective. To me it seems weird to be too hung up on
>> a "hard limit" on a specific day, when we are talking about merges where a
>> single merge / rebase takes more than one day. We will have to stop merging
>> smaller work to trunk anyway, when CEP-21 is being merged. No?
>>
>> henrik
>>
>> On Tue, Apr 18, 2023 at 3:24 AM Henrik Ingo 
>> wrote:
>>
>>> Trying to collect a few loose ends from across this thread
>>>
>>> *> I'm receptive to another definition of "stabilize", *
>>>
>>> I think the stabilization period implies more than just CI, which is
>>> mostly a function of unit tests working correctly. For example, at Datastax
>>> we have run a "large scale" test with >100 nodes, over several weeks, both
>>> for 4.0 and 4.1. For obvious reasons such tests can't run in nightly CI
>>> builds.
>>>
>>> Also it is not unusual that during the testing phase developers or
>>> specialized QA engineers can develop new tests (which are possibly added to
>>> CI) to improve coverage for and especially targeting new features in the
>>> release. For example the fixes to Paxos v2 were found by such work before
>>> 4.1.
>>>
>>> Finally, maybe it's a special case relevant only for  this release, but
>>> as a significant part of the Datastax team has been focused on porting
>>> these large existing features from DSE, and to get them merged before the
>>> original May date, we also have tens of bug fixes waiting to be upstreamed
>>> too. (It used to be an even 100, but I'm unsure what the count is today.)
>>>
>>> In fact! If you are worried about how to occupy yourself between a May
>>> "soft freeze" and September'ish hard freeze, you are welcome to chug on
>>> that backlog. The bug fixes are already public and ASL licensed, in the 4.0
>>> based branch here
>>> <https://github.com/datastax/cassandra/commits/ds-trunk>.
>>> Failed with an unknown error.
>>>
>>> *> 3a. If we allow merge of CEP-15 / CEP-21 after branch, we risk
>>> invalidating stabilization and risk our 2023 GA date*
>>>
>>> I think this is the assumption that I personally disagree with. If this
>>> is true, why do we even bother running any CI before the CEP-21 merge? It
>>> will all be invalidated anyway, right?
>>>
>>> In my experience, it is beneficial to test as early as possible, and at
>>> different checkpoints during development. If we wouldn't  do it, and we
>>> find some issue in late Novemb

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-25 Thread Henrik Ingo
1. Make sure that scatter/gather works across
>> nodes, this should Just Work but I’ve already needed to tweak a couple
>> things that I thought would just work, so I’m calling this out.And then to
>> be feature complete we would want to add: 1. CQL support for specifying the
>> dimensionality of a column up front, e.g. DENSE FLOAT32(1500).1. Pinecone
>> accepts the first vector as the Correct dimensions, and throws an error if
>> you give it one that doesn’t match, but this doesn’t work in a distributed
>> system where the second vector might go to a different node than the
>> first.  Elastic requires specifying it up front like I propose here.2.
>> Driver support3. Support for updating the vector in a row that hasn’t been
>> flushed yet, either by updating HNSW to support deletes, or by adding some
>> kind of invalidation marker to the overwritten vector.4. More performant
>> inserts.  Currently I have a big synchronized lock around the HNSW graph.
>> So we either need to shard it, like we do for the other SAI indexes, or add
>> fine-grained locking like ConcurrentSkipListMap to make HnswGraphBuilder
>> concurrent-ish.  I prefer the second option, since it allows us to avoid
>> building the graph once in memory and then a second time on flush, but it
>> might be more work than it appears.5. Add index configurability options:
>> similarity function and HNSW parameters M and ef.6. Expose the similarity
>> functions to CQL so you can SELECT x, cosine_similarity(x, query_vector)
>> FROM …Special thanks toMike Adamson and Zhao Yang made substantial
>> contributions to the branch you see here and provided indispensable help
>> understanding SAI.*
>> 
>>
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Henrik Ingo
n Thu, Apr 27, 2023 at 7:49 PM Josh McKenzie 
>> wrote:
>>
>>> From a machine learning perspective, vectors are a well-known concept
>>> that are effectively immutable fixed-length n-dimensional values that are
>>> then later used either as part of a model or in conjunction with a model
>>> after the fact.
>>>
>>> While we could have this be non-frozen and not call it a vector, I'd be
>>> inclined to still make the argument for a layer of syntactic sugar on top
>>> that met ML users where they were with concepts they understood rather than
>>> forcing them through the cognitive lift of figuring out the Cassandra
>>> specific contortions to replicate something that's ubiquitous in their
>>> space. We did the same "Cassandra-first" approach with our JSON support and
>>> that didn't do us any favors in terms of adoption and usage as far as I
>>> know.
>>>
>>> So is the goal here to provide something specific and idiomatic for the
>>> ML community or is the goal to make a primitive that's C*-centric that then
>>> another layer can write to? I personally argue for the former; I don't see
>>> this specific data type going away any time soon.
>>>
>>> On Thu, Apr 27, 2023, at 12:39 PM, David Capwell wrote:
>>>
>>> but as you point out it has the problem of allowing nulls.
>>>
>>>
>>> If nulls are not allowed for the elements, then either we need  a) a
>>> new type, or b) add some way to say elements may not be null…. As much as I
>>> do like b, I am leaning towards new type for this use case.
>>>
>>> So, to flesh out the type requirements I have seen so far
>>>
>>> 1) represents a fixed size array of element type
>>> * on write path we will need to validate this
>>> 2) element may not be null
>>> * on write path we will need to validate this
>>> 3) “frozen” (is this really a requirement for the type or is this
>>> just simpler for the ANN work?  I feel that this shouldn’t be a requirement)
>>> 4) works for all types (my requirement; original proposal is float only,
>>> but could logically expand to primitive types)
>>>
>>> Anything else?
>>>
>>> The key thing about a vector is that unlike lists or tuples you really
>>> don't care about individual elements, you care about doing vector and
>>> matrix multiplications with the thing as a unit.
>>>
>>>
>>> That maybe true for this use case, but “should” this be true for the
>>> type itself?  I feel like no… if a user wants the Nth element of a vector
>>> why would we block them?  I am not saying the first patch, or even 5.0 adds
>>> support for index access, I am just trying to push back saying that the
>>> type should not block this.
>>>
>>> (Maybe this is making the case for VECTOR FLOAT[N] rather than FLOAT
>>> VECTOR[N].)
>>>
>>>
>>> Now that nulls are not allowed, I have mixed feelings about FLOAT[N], I
>>> prefer this syntax but that limitation may not be desired for all use
>>> cases… we could always add LIST and ARRAY later
>>> to address that case.
>>>
>>> In terms of syntax I have seen, here is my ordered preference:
>>>
>>> 1) TYPE[size] - have mixed feelings due to non-null, but still prefer it
>>> 2) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this
>>> semantic…. Could even be NON NULL TYPE[size]
>>>
>>> On Apr 27, 2023, at 9:00 AM, Benedict  wrote:
>>>
>>>
>>> That’s a bounded ring buffer, not a fixed length array.
>>>
>>> This definitely isn’t a tuple because the types are all the same, which
>>> is pretty crucial for matrix operations. Matrix libraries generally work on
>>> arrays of known dimensionality, or sparse representations.
>>>
>>> Whether we draw any semantic link between the frozen list and whatever
>>> we do here, it is fundamentally a frozen list with a restriction on its
>>> size. What we’re defining here are “statically” sized arrays, whereas a
>>> frozen list is essentially a dynamically sized array.
>>>
>>> I do not think vector is a good name because vector is used in some
>>> other popular languages to mean a (dynamic) list, which is confusing when
>>> we also have a list concept.
>>>
>>> I’m fine with just using the FLOAT[N] syntax, and drawing no direct link
>>> with list. Though it is a bit strange that this particular type declaration
>>> looks so different to other collection types.
>>>
>>> On 27 Apr 2023, at 16:48, Jeff Jirsa  wrote:
>>>
>>> 
>>>
>>>
>>> On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis 
>>> wrote:
>>>
>>> It's been a while, so I may be missing something, but do we already have
>>> fixed-size lists?  If not, I don't see why we'd try to make this fit into a
>>> List-shaped problem.
>>>
>>>
>>> We do not. The proposal got closed as wont-fix
>>> https://issues.apache.org/jira/browse/CASSANDRA-9110
>>>
>>>
>>>
>>>
>>
>> --
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>

-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Henrik Ingo
By my superficial reading I get the impression that the main distinction is
that vectors don't need to support random access into a single
element/float. I haven't looked at what Jonathan is doing, but I assume,
and it seems Jonathan assumes or knows that this makes implementation both
easier and allows for important optimizations. Am I following correctly
here?

(Apologies if that is what your #1 is saying, I read yours as something
about secondary or maybe clustered indexes?)

Agree with #3 obviously.

#2... Vectors actually *could* support ordered (n-dimensional) indexes,
since they are vectors. But in practice it seems even asking for a simple
3D index is too much and too niche for anything else than Postgis.

henrik

henrik

On Fri, Apr 28, 2023 at 8:50 PM Benedict  wrote:

> I and others have claimed that an array concept will work, since it is
> isomorphic with a vector. I have seen the following counterclaims:
>
> 1. Vectors don’t need to support index lookups
> 2. Vectors don’t need to support ordered indexes
> 3. Vectors don’t need to support other types besides float
>
> None of these say that *vectors are not arrays*. At most these say “ANN
> indexes should only support float types” which is different, and not
> something I would dispute.
>
> If the claim is "there is no concept of arrays that is compatible with
> vector search" then let’s focus on that, because that is probably the
> initial source of the disconnect.
>
>
>
>
> On 28 Apr 2023, at 18:13, Henrik Ingo  wrote:
>
> 
> Benedict, I don't quite see why that matters? The argument is merely that
> this kind of vector, for this use case, a) is different from arrays, and b)
> arrays apparently don't serve the use case well enough (or at all).
>
> Now, if from the above it follows a discussion that a vector type cannot
> be a first class Cassandra type... that is of course a possible argument.
>
> But suggesting that Jonathan should work on implementing general purpose
> arrays seems to fall outside the scope of this discussion, since the result
> of such work wouldn't even fill the need Jonathan is targeting for here. I
> could also ask Jonathan to work on a JSONB data type, and it similarly
> would not be an interesting proposal to Jonathan, as it wouldn't fill the
> need for the specific use case he is targeting.
>
>
> But back to the main question... Why wouldn't a "vector for floats" type
> be general purpose enough that it should be delegated to some plugin?
> Machine Learning is a broad field in itself, with dozens of algorithms you
> could choose to use to build an AI model. And AI can be used in pretty much
> every industry vertical. If anything, I would claim DECIMAL is much more an
> industry specific special case type than these ML vectors would be.
>
>
>
> Back to Jonathan:
> >So in order of what makes sense to me:
> > 1. Add a vector type for just floats; consider adding bytes later if
> demand materializes. This gives us 99% of the value and limits the scope so
> we can deliver quickly.
> > 2. Add a vector type for floats or bytes. This gives us another 1% of
> value in exchange for an extra 20% or so of effort.
>
> Is it possible to implement 1 in a way that makes 2 possible in a future
> version?
>
> henrik
>
>
> henrik
>
> On Fri, Apr 28, 2023 at 7:33 PM Benedict  wrote:
>
>> pgvector is a plug-in. If you were proposing a plug-in you could ignore
>> these considerations.
>>
>> On 28 Apr 2023, at 16:58, Jonathan Ellis  wrote:
>>
>> 
>> I'm proposing a vector data type for ML use cases.  It's not the same
>> thing as an array or a list and it's not supposed to be.
>>
>> While it's true that it would be possible to build a vector type on top
>> of an array type, it's not necessary to do it that way, and given the lack
>> of interest in an array type for its own sake I don't see why we would want
>> to make that a requirement.
>>
>> It's relevant that pgvector, which among the systems offering vector
>> search is based on the most similar system to Cassandra in terms of its
>> query language, adds a vector data type that only supports floats *even
>> though postgresql already has an array data type* because the semantics are
>> different.  Random access doesn't make sense, string and collection and
>> other datatypes don't make sense, typical ordered indexes don't make sense,
>> etc.  It's just a different beast from arrays, for a different use case.
>>
>> On Fri, Apr 28, 2023 at 10:40 AM Benedict  wrote:
>>
>>> But you’re proposing introducing a general purpose type - th

Re: [DISCUSS] The future of CREATE INDEX

2023-05-17 Thread Henrik Ingo
I have read the thread but chose to reply to the top message...

I'm coming to this with the background of having worked with MySQL, where
both the storage engine and index implementation had many options, and
often of course some index types were only available in some engines.

I would humbly suggest:

1. What's up with naming anything "legacy". Calling the current index type
"2i" seems perfectly fine with me. From what I've heard it can work great
for many users?

2. It should be possible to always specify the index type explicitly. In
other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
(if it isn't already)

2b) It should be possible to just say "SAI" or "SASIIndex", not the full
Java path.

3. It's a fair point that the "CUSTOM" word may make this sound a bit too
special... The simplest change IMO is to just make the CUSTOM work optional.

4. Benedict's point that a YAML option is per node is a good one... For
example, you wouldn't want some nodes to create a 2i index and other nodes
a SAI index for the same index That said, how many other YAML options
can you think of that would create total chaos if different nodes actually
had different values for them? For example what if a guardrail allowed some
action on some nodes but not others?  Maybe what we need is a jira ticket
to enforce that certain sections of the config must not differ?

5. That said, the default index type could also be a property of the
keyspace

6. MySQL allows the DBA to determine the default engine. This seems to work
well. If the user doesn't care, they don't care, if they do, they use the
explicit syntax.

henrik


On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe 
wrote:

> Earlier today, Mick started a thread on the future of our index creation
> DDL on Slack:
>
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
> <https://urldefense.com/v3/__https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019__;!!PbtH5S7Ebw!YuQzuQkxC0gmD9ofXEGoaEmVMwPwZ_ab8-B_PCfRfNsQtKIZDLOIuw38jnV1Vt8TqHXn-818hL-CoLbVJXBTCWgSxoE$>
>
> At the moment, there are two ways to create a secondary index.
>
> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON  ()*
>
> This creates an optionally named legacy 2i on the provided table and
> column.
>
> ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>
> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON  ()
> USING  [WITH OPTIONS = ]*
>
> This creates a secondary index on the provided table and column using the
> specified 2i implementation class and (optional) parameters.
>
> ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
> 'StorageAttachedIndex'
>
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
> shorthand for the fully-qualified class name, which is also valid.)
>
> So what is there to discuss?
>
> The concern Mick raised is...
>
> "...just folk continuing to use CREATE INDEX  because they think CREATE
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
> doing 2i (when they think they are, and/or we definitely want them to be,
> using SAI)"
>
> To paraphrase, we want people to use SAI once it's available where
> possible, and the default behavior of CREATE INDEX could be at odds w/
> that.
>
> The proposal we seem to have landed on is something like the following:
>
> For 5.0:
>
> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>
> (Note: How this would interact w/ the existing secondary_indexes_enabled
> YAML options isn't clear yet.)
>
> Post-5.0:
>
> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity
> w/ it.
> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
> would both be flexible enough to accommodate index implementation selection
> and prescriptive enough to force the user to make a decision (and wouldn't
> change the legacy behavior of the existing CREATE INDEX). In this world,
> creating a legacy 2i might look something like CREATE INDEX...USING
> `legacy`.
> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>
> Eventually we would have a single enabled DDL statement for index creation
> that would be minimal but also explicit/able to handle some evolution.
>
> What does everyone think?
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [VOTE] Release Apache Cassandra 5.0-alpha1

2023-09-05 Thread Henrik Ingo
Reading this a week later, just wanted to point out that this discussion
already shows that releasing an alpha1 now was/is a good idea. It
immediately helped attract attention from a wider group of people, and we
can now address these issues with less stress than if all features were
already done, or there is an upcoming conference, or thanksgiving Holiday
or Christmas...

People often forget that the release machinery itself is a complex
technical system that very rarely "just works" when once a year you change
branches and version numbers...

And thanks Justin for such a pedantic review so early in our process. Very
helpful and convenient!

henrik

On Tue, Aug 29, 2023 at 5:59 AM Justin Mclean  wrote:

> Hi,
>
> If I were to vote on this, it would be -1 (non-binding) due to
> non-compliance with ASF policy on releases.
>
> I checked:
> - signatures and hashes are correct
> - It looks like there might be compiled code in the release? [1][2]
> - LICENSE is misisng some 3rd party code license information [5] This
> contains code "Copyright DataStax, Inc." under ALv2, python-smhasher under
> MIT, OrderedDict under MIT (copyright Raymond Hettinger) and code from
> MagnetoDB under ALv2.
> - LICENSE has no mention of 3rd party CRC code in [10]
> - Note that any code under CC 4.0 is incompatible with the ALv2. [11]
> - LICENSE also doesn't mention this file [9]
> - In LICENSE LongTimSort.java incorrectly mentions two different copyright
> owners
> - In LICENSE, AbstractGuavaIterator.java is incorrectly mentioned as
> AbstractIterator.java
> - NOTICE seems OK but may also be missing some things due to misisng 3rd
> party code in LICENSE under ALv2
> - Files are misisng ASF headers [3][4][6][7][8] are these 3rd party files?
> - I didn't try compiling from the source
>
> Kind Regards,
> Justin
>
> 1../test/data/serialization/3.0/utils.BloomFilter1000.bin
> 2. ./test/data/serialization/4.0/utils.BloomFilter1000.bin
> 3. ./doc/modules/cassandra/examples/BASH/*.sh
> 4. ./pylib/Dockerfile.ubuntu.*
> 5. ./lib/cassandra-driver-internal-only-3.25.0.zip
> 6. ./lib/cassandra-driver-3.25.0/cassandra/murmur3.py
> 7. ./lib/cassandra-driver-3.25.0/cassandra/io/asyncioreactor.py
> 8 ./lib/cassandra-driver-3.25.0/cassandra/io/libevwrapper.c
> 9. ./tools/fqltool/src/org/apache/cassandra/fqltool/commands/Dump.java
> 10. ./src/java/org/apache/cassandra/net/Crc.java
> 11. https://www.apache.org/legal/resolved.html#cc-by
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread Henrik Ingo
It seems I was volunteered to rebase the Astra implementation of this
functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
of course) I'll try to get going today or tomorrow, so that this
discussion can then benefit from having that code available for inspection.
And potentially using it as a soluttion to this use case.

On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:

> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>


-- 

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com

<https://www.facebook.com/datastax>  <https://twitter.com/datastax>
<https://www.linkedin.com/company/datastax/>  <https://github.com/datastax/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-06 Thread Henrik Ingo
ngle partition, or layer a complex state machine on top of the
> database. These are sophisticated and costly activities that our users
> should not be expected to undertake. Since distributed databases are
> beginning to offer distributed transactions with fewer caveats, it is past
> time for Cassandra to do so as well.
>
> This CEP proposes the use of several novel techniques that build upon
> research (that followed EPaxos) to deliver (non-interactive) general
> purpose distributed transactions. The approach is outlined in the wikipage
> and in more detail in the linked whitepaper. Importantly, by adopting this
> approach we will be the _only_ distributed database to offer global,
> scalable, strict serializable transactions in one wide area round-trip.
> This would represent a significant improvement in the state of the art,
> both in the academic literature and in commercial or open source offerings.
>
> This work has been partially realised in a prototype. This partial
> prototype has been verified against Jepsen.io’s Maelstrom library and
> dedicated in-tree strict serializability verification tools, but much work
> remains for the work to be production capable and integrated into Cassandra.
>
> I propose including the prototype in the project as a new source
> repository, to be developed as a standalone library for integration into
> Cassandra. I hope the community sees the important value proposition of
> this proposal, and will adopt the CEP after this discussion, so that the
> library and its integration into Cassandra can be developed in parallel and
> with the involvement of the wider community.
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Henrik Ingo
On Tue, Sep 7, 2021 at 1:31 AM bened...@apache.org 
wrote:

>
> Of course, but we may have to be selective in our back-and-forth. We can
> always take some discussion off-list to keep it manageable.
>
>
I'll try to converge.Sorry if a few comments were a bit "editorial" in the
first message. I find that sometimes it pays off to also ask the dumb
questions, as long as we don't get stuck on any of them.


> > The algorithm is hard to read since you omit the roles of the
> participants.
>
> Thanks. I will consider how I might make it clearer that the portions of
> the algorithm that execute on receipt of messages that may only be received
> by replicas, are indeed executed by those replicas.
>
>
In fact the same algorithm in the CEP was easier to read exactly because of
this, I now realize.


> > So I guess my question is how and when reads happen?
>
> I think this is reasonably well specified in the protocol and, since it’s
> unclear what you’ve found confusing, I don’t know it would be productive to
> try to explain it again here on list. You can look at the prototype, if
> Java is easier for you to parse, as it is of course fully specified there
> with no ambiguity. Or we can discuss off list, or perhaps on the community
> slack channel.
>
>
Maybe my question was a bit too open ended, as I didn't want to lead into
any specific direction.

I can of course tell where reads happen in the execution algorithm. What I
would like to understand better and without guessing is, what do these
transactions look like from a client/user point of view? You already
confirmed that interactive transactions aren't intended by this proposal.
At the other end of the spectrum, given that this is a Cassandra
Enhancement Proposal, and the CEP does in fact state this, it seems like
providing equivalent functionality to already existing LWT is a goal. So my
question is whether I should just* think of this as "better and more
efficient LWT" or is there something more? Would this CEP or follow-up work
introduce any new CQL syntax, for example?

To give just one more example of the kind of questions I'm triangulating
at: Suppose I wanted to do a long running read-only transaction, such as
querying a secondary index. Like SERIAL in current Cassandra, but taking
seconds to execute and returning thousands of rows. How would you see the
possibilities and limits of such operations in Accord?

*) Should emphasize that better scaling LWTs isn't just "just". If I
imagine a future Cassandra cluster where all reads and writes are
transactional and therefore strict serializeable, that would be quite a
change from today.

henrik


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Henrik Ingo
On Tue, Sep 7, 2021 at 12:26 PM bened...@apache.org 
wrote:

> > whether I should just* think of this as "better and more efficient LWT”
>
> So, the LWT concept is a Cassandra one and doesn’t have an agreed-upon
> definition. My understanding of a core feature/limitation of LWTs is that
> they operate over a single partition, and as a result many operations are
> impossible even in multiple rounds without complex distributed state
> machines. The core improvement here, besides improved performance, is that
> we will be able to operate over any set of keys at-once.
>
>
My bad, I have never used LWT and forgot / didn't know they were single
partition. The CEP makes more sense now.



> How this facility is evolved into user-facing capabilities is an
> open-ended question. Initially of course we will at least support the same
> syntax but remove the restriction on operating over a single partition. I
> haven’t thought about this much, as the CEP is primarily for enabling
> works, but I think we will want to expand the syntax in two ways:
>
>  1) to support more complex conditions (simple AND conditions across all
> partitions seem likely too restrictive, though they might make sense for
> the single partition case);
>   2) to support inserting data from one row into another, potentially with
> transformations being applied (including via UDFs).
>
> These are both relatively manageable improvements that we might want to
> land in the same major release as the transactions themselves. The core
> facility can be expanded quite broadly, though. It would be possible for
> instance to support some interpreted language(s) as part of a query, so
> that arbitrary work can be applied in the transaction.
>

I was thinking that a path similar to Calvin/FaunaDB is certainly looming
in the horizon at least. I've been following those with interest, because
a) it's refreshingly outside of the box thinking, and b) they seem to be
able to push the limitations of this approach much beyond what one might
imagine when reading about it the first time. But like you also point out,
it remains to be seen whether users actually want those kinds of
transactions. We are creatures of habit for sure.



> Or, perhaps the community would rather build atop the feature to support
> interactive transactions at the client. I can’t predict resourcing for
> this, though, and it might be a community effort. I think it would be quite
> tractable once this work lands, however.
>
> > Suppose I wanted to do a long running read-only transaction
>
> So, there’s two sides to this: with and without paging. A long running
> read-only transaction taking a few seconds is quite likely to be fine and
> we will probably support with some MVCC within the transaction system
> itself. This may or may not be part of v1, it’s hard to predict with
> certainty as this is going to be a large undertaking.
>
> But for paged queries we’d be talking about SNAPSHOT isolation. This is
> likely to be something the community wants to support before long anyway
> and is probably not as hard as you might think. It is probably outside of
> the scope of this work, though the two would dovetail very nicely.
>

I've pointed out to some of my colleagues that since Cassandra's storage
engine is an LSM engine, with some additional work it could become an MVCC
style storage engine. Your thinking here seems to be in the same direction,
even if it's beyond version 1. (Just for context, also for benefit of other
readers on the list, it took MongoDB 5 years and 6 major releases to
develop distributed multi-shard transactions. So it's good to talk about
the general direction, but understanding that this is not something anyone
will finish before Christmas.)

It seems to me at that point long running queries and interactive
transactions are mostly the same problem.



Benedict, thanks for the answers. Since I'm not a Cassandra developer I
feel it would be inappropriate for me to express an opinion for or against,
so I'll just end with saying this is an interesting proposal and the
authors have done a good job pulling together ingredients from state of the
art work in this area. As such it will be interesting to follow the
discussion and work from whitepaper to implementation.


A secondary objective was also to just let everyone know I am lurking here.
If you ever want to reach out for an off-band discussion, you now have my
contact details.

henrik


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-07 Thread Henrik Ingo
On Tue, Sep 7, 2021 at 5:06 PM bened...@apache.org 
wrote:

> > I was thinking that a path similar to Calvin/FaunaDB is certainly
> looming in the horizon at least.
>
> I’m not sure which aspect of these systems you are referring to. Unless I
> have misunderstood, I consider them to be strictly inferior approaches
> (particularly for Cassandra) as they require a _global_ leader process and
> as a result have scalability limits. Users simply shift the sharding
> problem to the cluster level rather than the node level, but the
> fundamental problem remains. This may be acceptable for many users, but was
> contrary to the goals of this CEP.
>

Oh yes. For sure it's one of the strengths of the CEP that it is clearly
designed to fit well into the existing Cassandra architecture and
experience.

I was referring to the property that Calvin transactions also need to be
sent to the cluster in a single shot, but then they have extended the
functionality by allowing programming logic to be executed inside the
transaction. (Like a stored procedure, if you will.) So the transactions
can be multi-statement with complex logic, they just can't communicate
outside the cluster - such as back and forth with the client and server.


> > good job pulling together ingredients from state of the art work in this
> area
>
> In case this was lost in the noise: this work is not simply an assembly of
> prior work. It introduces entirely novel approaches that permit the work to
> exceed the capabilities of any prior research or production system. It is
> worth properly highlighting that if we deliver this, Cassandra will have
> the most sophisticated transaction system full stop.
>
>
Of course. Maybe it's just me, but I'm at least equally impressed by the
"level of education" the authors show in not reinventing the wheel for the
details where copying a feature, or at least being inspired by one, from
some existing publication or implementation was possible. Knowing what to
keep vs what you want to improve isn't easy. Also, it makes the whitepaper
an interesting read when in addition to learning about Accord I also
learned about several other systems that I hadn't previously read about.

henrik


Re: [DISCUSS] CEP-7 Storage Attached Index

2021-09-16 Thread Henrik Ingo
; > > > > > could
> > > > > > > >
> > > > > > > > "triage
> > > > > > > >
> > > > > > > > away"?
> > > > > > > >
> > > > > > > > I believe most of the known bugs in 2i/SASI either
> > > > > > > >
> > > > > > > > have
> > > > > > > >
> > > > > > > > been
> > > > > > > >
> > > > > > > > addressed
> > > > > > > >
> > > > > > > > in
> > > > > > > >
> > > > > > > > SAI or
> > > > > > > >
> > > > > > > > don't apply to SAI.
> > > > > > > >
> > > > > > > > And, is it time for the project to start
> > > > > > > >
> > > > > > > > introducing new
> > > > > > > >
> > > > > > > > SPI
> > > > > > > >
> > > > > > > > implementations as separate sub-modules and jar
> > > > > > > >
> > > > > > > > files
> > > > > > > >
> > > > > > > > that
> > > > > > > >
> > > > > > > > are
> > > > > > > >
> > > > > > > > only
> > > > > > > >
> > > > > > > > loaded
> > > > > > > >
> > > > > > > > at runtime based on configuration settings? (sorry
> > > >

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-21 Thread Henrik Ingo
On client generated timestamps...

On Mon, Sep 20, 2021 at 7:17 PM Joseph Lynch  wrote:

> * Relatedly I'm curious if there is any way that the client can
> acquire the timestamp used by the transaction before sending the data
> so we can make the operations idempotent and unrelated to the
> coordinator that was executing them as the storage nodes are
> vulnerable to disk and heap failure modes which makes them much more
> likely to enter grey failure (slow). Alternatively, perhaps it would
> make sense to introduce a set of optional dedicated C* nodes for
> reaching consensus that do not act as storage nodes so we don't have
> to worry about hanging coordinators (join_ring=false?)?
>


I've thought about this myself some time ago. The answer is yes, the client
could generate its own timestamps, provided that the client is also in sync
with the clock of the cluster. The coordinator that receives the
transaction from the client would simply need to enforce that the client
generated timestamp is within the margin that would be acceptable if the
coordinator itself had generated the timestamp. In addition, coordinator
must ensure that the transaction id is unique.

But... This still wouldn't give you idempotency in itself. This is because
if something failed with the transaction, you cannot resend the same
timestamp later, because it would now be outside the acceptable range of
timestamps. (Expired, if you will.) At best maybe the client could somehow
use the (timestamp, id) to query a node to verify whether such a
transaction was recently committed. I'm unsure whether that's convenient
for a user though.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread Henrik Ingo
ports 60-70k TPM
for a non-standard TPC-C where varying client threads was allowed and
schema was modified to take advantage of denormalization in a document
model. I'm not aware of benchmarks for cross shard transactions ,nor would
I expect such results to be great. The philosophy there has been that cross
shard transactions are expected to be a minority.


Functionality and limitations: MongoDB's approach has been similar in
spirit to what we can observe in RDBMS market. Even if MySQL (since
forever) and PostgreSQL (2011) provide serializeable isolation, it is not
default, and it's hard to find a single user who ever wanted to use it.
Snapshot Isolation and Causal Consistency are considered the optimal
tradeoff between good consistency and performance, and minimal hassle with
lots of aborted transactions. The typical MongoDB user is like the typical
MySQL and PostgreSQL user happy with this. It is possible to emulate SELECT
FOR UPDATE by using findAndModify, which will turn your writes to a read
and therefore take a write lock on all touched records.

Note that first versions of MongoDB transactions got quite bad Jepsen
review. This was mostly a function of none of the above guarantees being
default, and the client API being really confusing, so most users -
including Kyle Kingsbury and yours truly - would struggle to get all
parameters right to actually enjoy the above mentioned guarantees. This is
a sober reminder that this is complex stuff to get right end to end.

Note that MongoDB also supports linearizeable writes and reads, but only on
a per-record basis. Linearizeable is not available for transactions.

It should be noted MongoDB's approach allows for interactive transactions.


Application to Cassandra: D.

Replication being leader based is a poor fit for expectations of a typical
Cassandra user. It's hard to predict whether a typical Cassandra workload
can expect cross-partition transactions to be the exceptional case, but my
instinct says no. The Lamport clock and the causal consistency it provides
is simple to understand and could be a building block in a transactional
Cassandra cluster. My personal opinion is that a "synchronized timestamp"
(or Hybrid Logical Clock I guess?) scheme like in Accord is more familiar
to current Cassandra semantics.

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-22 Thread Henrik Ingo
On Wed, Sep 22, 2021 at 7:56 AM bened...@apache.org 
wrote:

> Could you explain why you believe this trade-off is necessary? We can
> support full SQL just fine with Accord, and I hope that we eventually do so.
>

I assume this is really referring to interactive transactions = multiple
round trips to the client within a transaction.

You mentioned previously we could later build a more MVCC like transaction
semantic on top of Accord. (Independent reads from a single snapshot,
followed by a commit using Accord.) In this case I think the relevant
discussion is whether Accord is still the optimal building block
performance wise to do so, or whether users would then have lower
consistency level but still pay the performance cost of a stricter
consistency level.

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
the SkewMax window, because we can assume that the trx timestamps
originating at the same coordinator cannot arrive out of order when using
TPC?



henrik







On Mon, Sep 27, 2021 at 11:59 PM bened...@apache.org 
wrote:

> Ok, it’s time for the weekly poking of the hornet’s nest.
>
> Any more thoughts, questions or criticisms, anyone?
>
> From: bened...@apache.org 
> Date: Friday, 24 September 2021 at 22:41
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> I’m not aware of anybody having taken any notes, but somebody please chime
> in if I’m wrong.
>
> From my recollection, re Accord:
>
>
>   *   Q: Will batches now support rollbacks?
>  *   Batches would apply atomically or not, but unlikely to have a
> concept of rollback. Timeouts remain unknown, but hope to have some
> mechanism to provide clients a definitive answer about such transactions
> after the fact.
>   *   Q: Can stale replicas participate in transactions?
>  *   Accord applies conflicting transactions in-order at every
> replica, so only nodes that are up-to-date may participate in the execution
> of a transaction, but any replica may participate in agreeing a
> transaction. To ensure replicas remain up-to-date I anticipate introducing
> a real-time repair facility at the transactional message level, with peers
> reconciling recently processed messages and cross-delivering any that are
> missing.
>   *   Possible UX directions in very vague terms: CQL atomic and
> conditional batches initially; going forwards interactive transactions?
> Complex user defined functions? SQL?
>   *   Discussed possibility of LOCAL_QUORUM reads for globally replicated
> transactional tables, as this is an important use case
>  *   Simple stale reads to transactional tables
>  *   Brainstormed a bit about serializable reads to a single DC
> without (normally) crossing WAN
>  *   Discussed possibility of multiple ACKs providing separate LAN and
> WAN persistence notifications to clients
>   *   Discussed size of fast path quorums in Accord, and how this might
> affect global latency in high RF clusters (i.e. not optimal, and in some
> cases may need every DC to participate) and how this can be modified by
> biasing fast path electorate so that 2 of the 3 DCs may reach fast-path
> decisions with each other (remaining DC having to reach both those DCs to
> reach fast path). Also discussed Calvin-like modes of operation that would
> offer optimal global latency for sufficiently small clusters at RF=3 or
> RF=5.
>
> I’m sure there were other discussions I can’t remember, perhaps others can
> fill in the blanks.
>
>
> From: Jonathan Ellis 
> Date: Friday, 24 September 2021 at 20:28
> To: dev 
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> Does anyone have notes for those of us who couldn't make the call?
>
> On Wed, Sep 22, 2021 at 1:35 PM bened...@apache.org 
> wrote:
>
> > Hi everyone,
> >
> > Joey has helpfully arranged a call for tomorrow at 8am PST / 10am CST /
> > 4pm BST to discuss Accord and other things in the community. There are no
> > plans to make any kind of project decisions. Everyone is welcome to drop
> in
> > to discuss Accord or whatever else might be on your mind.
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gather.town_app_2UKSboSjqKXIXliE_ac2021-2Dcass-2Dsocial&d=DwIF-g&c=adz96Xi0w1RHqtPMowiL2g&r=eYcKRCU2ISzgciHbxg_tERbSQOZMMscdGLftkLqUuXo&m=yN7Y6u6BfW9NUZaSousZnD2Y-WiBtM1xDeJNy2WEq_r-gZqFwHVT4IPaeMOUa-AF&s=cgKblfbz9lUghSPbj5Si7oM7RsZy1w9vfvWjyzL8MXs&e=
> >
> >
> > From: bened...@apache.org 
> > Date: Wednesday, 22 September 2021 at 16:22
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > No, I would expect to deliver strict serializable interactive
> transactions
> > using Accord. These would simply corroborate that the participating keys
> > had not modified their write timestamps during the final transaction.
> These
> > could even be undertaken with still only a single wide area round-trip,
> > using local copies of the data to assemble the transaction (though this
> > would marginally increase the chance of aborts)
> >
> > My goal for MVCC is parallelism, not additional isolation levels (though
> > snapshot isolation is useful and we’ll probably also want to offer that
> > eventually)
> >
> > From: Henrik Ingo 
> > Date: Wednesday, 22 September 2021 at 15:15
> > To: dev@cassandra.apache.org 
> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> > On Wed, Sep 22, 2021 at 7:56 AM bened...@apa

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
On Fri, Oct 1, 2021 at 4:37 PM Henrik Ingo  wrote:

> A known optimization for the hot rows problem is to "hint" or manually
> force clients to direct all updates to the hot row to the same node,
> essentially making the system leader based. This allows the database to
> start processing new updates even while the first one is still committing.
> (See Galera for an example implementing this
> <https://galeracluster.com/library/documentation/using-sr.html#usr-hot-records>.)
> This makes me wonder whether there is a similar optimization for Accord
> where transactions from the same coordinator can be allowed to commit
> within the SkewMax window, because we can assume that the trx timestamps
> originating at the same coordinator cannot arrive out of order when using
> TPC?
>
>
TCP

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
On Fri, Oct 1, 2021 at 5:30 PM bened...@apache.org 
wrote:

> > Typical value for SkewMax in e.g. the Spanner paper, some CockroachDB
> discussions = 7 ms
>
> I think skew max is likely to be much lower than this, even on commodity
> hardware. Bear in mind that unlike Cockroach and Spanner correctness does
> not depend on this value, only performance. So we can pick the real number,
> not some p100 outlier value.
>
> Also bear in mind that this is an optimisation. In clusters where it makes
> no sense we can simply use the raw protocol and accept transactions will
> very infrequently take two round-trips (which is fine, because in this
> scenario round-trips are cheap).
>
>
Oh, this was not at all obvious :-D

If I'm reading you correctly, then Accord does / could do exactly what I
was asking for: two round trips in a single DC cluster, and one roundtrip +
SkewMax when network roundtrips are >> SkewMax.



> > A known optimization for the hot rows problem is to "hint" or manually
> force clients to direct all updates to the hot row to the same node
>
> So, with a leaderless protocol like Accord the ordering decisions are
> never really bottlenecked - no matter how many are in-flight, a new
> transaction will experience no additional latency determining its execution
> order. The only bottleneck will be execution. For this it is absolutely
> possible to funnel everything to a single coordinator, but I don’t know
> that this would in practice achieve much – the important bottleneck would
> be that the coordinators are all within the same
>
> DC, so that the _replicas_ may all respond to them with their data
> dependencies with minimal delay. This is something we discussed in the
> ApacheCon call as it happens. If a significant number of transactions are
> pending, and they are in different DCs, it would be quite straightforward
> to nominate a coordinator within the DC serving the majority of operations
> to serve the remainder, and to forward the results to the original
> coordinators.
>
>
Thanks for explaining. This is really interesting. I now reread section 2.2
of the paper and realize it says exactly this.

So in Accord:

Step 1: One network round trip + SkewMax to establish a global ordering.

Step 2: a) One (local) network round trip for read phase, One (wan) round
trip for writes.
 b) In addition, before either reading or writing, the node
must first commit and apply all previous transactions that are in the
"deps" set of this transaction.

In addition, if we implement interactive transactions, or support for
secondary indexes, or other "complex" transactions, then that work would
happen before Step 1.

Ok, now that I spelled this out... assuming I got it correct... Then this
actually resembles Galera more than Spanner. The wall clock time is not
actually the transaction id, it's just a step in the consensus dialogue
where nodes agree on a global ordering.



> I don’t anticipate this optimisation being a high priority until we have
> user reports of this bottleneck in the wild, however. Since clients for
> many workloads will naturally be geo-partitioned so that related state is
> being updated from the same region, it might simply not be needed – at
> least any time soon.
>
>
For sure. I think we're all just trying to understand the landscape what we
are talking about here, not trying to say everything should be implemented
in v1.


henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-01 Thread Henrik Ingo
On Fri, Oct 1, 2021 at 7:20 PM bened...@apache.org 
wrote:

> I haven’t encountered Galera – do you have any technical papers to hand?
>
>
Yes, but it's a whole thesis :-)

https://www.inf.usi.ch/faculty/pedone/Paper/199x/These-2090-Pedone.pdf

I guess parts of that were presented in conference papers.

Pedone's work implements a protocol with Snapshot Isolation. More recent
work from down under describe a similar system providing Serializeable
Snapshot Isolation:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.185&rep=rep1&type=pdf


The best known implementation of Pedone's work would be Galera Cluster,
which hooks the "Galera" replication library into MySQL. It's also included
with MariaDB Cluster and Percona XtraDB Cluster. Oracle later did an
independent implementation (for IPR ownership reasons) which is known as
InnoDB Cluster.

This page in the Galera docs has a great diagram to get you started:
https://galeracluster.com/library/documentation/certification-based-replication.html

For an end user oriented beginner lecture, search conference video
recordings for Seppo Jaakola:
https://www.youtube.com/watch?v=5e3unwy_OVs


Worth calling out that we are in RDBMS land now, and the above is just a
replication solution, there is no sharding anywhere. For the Serializeable
paper, I struggle to even imagine how it could scale to multiple shards.
For SI it's kind of easier as only write conflicts need to be checked.

henrik



-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread Henrik Ingo
mance would be "best case"
in the sense that I would expect Snapshot and Serializeable to have worse
performance, but that overhead can be considered as inherent in the
isolation level rather than a fault of Accord.

* Implementing READ COMMITTED transactions on top of Accord is rather
straightforward and can be described and discussed in this email thread,
which could hopefully contribute to our understanding of the problem space.
(Could also be a real CEP, if we think it's a useful first step for
interactive transactions, but for now I'm dumping it here just to try to
bring a concrete example into the discussion.)



Goal: READ COMMITTED interactive transactions

Dependency: Assume a Cassandra database with CEP-15 implemented.


Approach: The conversational part of the transaction is a sequence of
regular Cassandra reads and writes. Mutations are however executed as
read-queries toward the database nodes. Database state isn't modified
during the conversational phase, rather the primary keys of the
to-be-mutated rows are stored for later use. Accord is essentially the
commit phase of the transaction. All primary keys to be updated are the
write set of  the Accord transaction. There's no need to re-execute the
reads, so the read set is empty.

We define READ COMMITTED as "whatever is returned by Cassandra when
executing the query (with QUORUM consistency)". In other words, this
functionality doesn't require any changes to the storage engine or other
fundamental changes to Cassandra. The Accord commit is guaranteed to
succeed per design and the READ COMMITTED transaction doesn't add any
additional checks for conflicts. As such, this functionality remains
abort-free.


Proposed Changes: A transaction manager is added to the coordinator, with
the following functionality:

BEGIN - initialize transaction state in the coordinator. After a BEGIN
statement, the following commands are modified as follows:

INSERT, UPDATE, DELETE: Transform to an equivalent SELECT, returning the
primary key columns. Store the original command (INSERT, etc…) and the
returned primary keys into write set.

SELECT - no changes, except for read your own writes. The results of a
SELECT query are returned to the client, but there's no need to store the
results in the transaction state.

Transaction reads its own writes - For each SELECT the coordinator will
overlay the current write set onto the query results. You can think of the
write set as another memtable at Level -1.

Secondary indexes are supported without any additional work needed.

COMMIT - Perform a regular Accord transaction, using the above write set as
the Accord write set. The read set is empty. The commit is guaranteed to
succeed. In the end, clear state on the coordinator.

New interfaces: BEGIN and COMMIT. ROLLBACK. Maybe some command to declare
READ COMMITTED isolation level and to get the current isolation level.


Future work: A motivation for the above proposal is that the same scheme
could be extended to support SNAPSHOT ISOLATION transactions. This would
require MVCC support from the storage engine.



---

It would be interesting to hear from list members whether the above appears
to understand Accord (and SQL) correctly or whether I'm missing something?

henrik


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Tradeoffs for Cassandra transaction management

2021-10-12 Thread Henrik Ingo
On Tue, Oct 12, 2021 at 11:54 PM Henrik Ingo 
wrote:

> Secondary indexes are supported without any additional work needed.
>
> Correction: The "transaction reads its own writes" feature would require
to also store secondary index keys in the transaction state. These of
course needn't be part of the write set in the commit.

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


[IDEA] Read committed transaction with Accord

2021-10-13 Thread Henrik Ingo
On Wed, Oct 13, 2021 at 1:26 AM Blake Eggleston
 wrote:

> Hi Henrik,
>
> I would agree that the local serial experience for valid use cases should
> be supported in some form before legacy LWT is replaced by Accord.
>
>
Great! It seems there's a seed of consensus on this point.


> Regarding your read committed proposal, I think this CEP discussion has
> already spent too much time talking about hypothetical SQL implementations,
> and I’d like to avoid veering off course again. However, since you’ve asked
> a well thought out question with concrete goals and implementation ideas,
> I’m happy to answer it. I just ask that if you want to discuss it beyond my
> reply, you start a separate ‘[IDEA] Read committed transaction with Accord’
> thread where we could talk about it a bit more without it feeling like we
> need to delay a vote.
>
>
This is a reasonable request. We were already in a side thread I guess, but
I like organizing discussions into separate threads...

Let's see if I manage to break the thread correctly simply by editing the
subject...

FWIW, my hope for this discussion was that by providing a simple yet
concrete example, it would facilitate the discussion toward a CEP-15 vote,
not distract from it. As it happened, Alex Miller was writing a hugely
helpful email concurrently with mine, which improves details in CEP-15, so
I don't know if expecting the discussion to die out just yet is ignoring
people who maybe working off list to still understand this rather advanced
reading material.



> So I think it could work with some modifications.
>
> First you’d need to perform your select statements as accord reads, not
> quorum reads. Otherwise you may not see writes that have been (or could
> have been) committed. A multi-partition write could also appear to become
> undone, if a write commit has not reached one of the keys or needs to be
> recovered.
>

Ah right. I think we established early on that tables should be either
Accord-only, or legacy C* only. I was too fixated on the "no other changes"
and forgot this.

This is then a very interesting detail you point out! It seems like
potentially every statement now needs to go through the Accord consensus
protocol, and this could become expensive, where my goal was to design the
simplest and most lightweight example thinkable. BUT for read-only Accord
transactions, where I specifically also don't care about serializability,
wouldn't this be precisely the case where I can simply pick my own
timestamp and do a stale read from a  nearby replica?


>
> Second, when you talk about transforming mutations, I’m assuming you’re
> talking about confirming primary keys do or do not exist,


No, I was thinking more broadly of operations like `UPDATE table1 SET
column1=x WHERE pk >= 10 and pk <= 20`

My thinking was that I need to know the exact primary keys touched both
during the conversational phase and the commit phase. In essence, this is
an interactive reconnaisance phase.

You make a great point that for statements where the PK is explicit, they
can just be directly added to the write set and transaction state. Ex:
`UPDATE table1 SET column1=x WHERE pk IN (1,2,3)`



> and supporting auto-incrementing primary keys. To confirm primary keys do
> or do not exist, you’d also need to perform an accord read also.


For sure.


> For auto-incrementing primary keys, you’d need to do an accord read/write
> operation to increment a counter somewhere (or just use uuids).
>
>
I had not considered auto-increment at all, but if that would be a
requirement, then I tend to translate "auto-increment" into "any service
that can hand out unique integers". (In practice, no database can force me
to commit the integers in the order that they're actually monotonically
increasing, so "auto-increment" is an illusion, I realized at some point in
my career.)


> Finally, read committed does lock rows, so you’d still need to perform a
> read on commit to confirm that the rows being written to haven’t been
> modified since the transaction began.
>

Hmm...

As we see in a separate discussion is already diving into this, it seems
like at least the SQL 1992 standard only says read committed must protect
against P1 and that's it. My suspicion is that since most modern databases
start from MVCC, they essentially "over deliver" when providing read
committed, since the implementation naturally provides snapshot reads and
in fact it would be complicated to do something less consistent.

For this discussion it's not really important which interpretation is
correct, since either is a reasonable semantic. For my purposes I'll just
note that needing to re-execute all reads during the Accord phase (commit
phase) would make the design more expensive, since the transaction is now
execute

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-13 Thread Henrik Ingo
On Wed, Oct 13, 2021 at 12:25 AM Alex Miller  wrote:

> I have, purely out of laziness, been engaging on this topic on ASF Slack as
> opposed to dev@[1].  Benedict has been overly generous in answering
> questions and considering future optimizations there, but it means that I
> inadvertently forked the conversation on this topic.  To bring the
> highlights of that conversation back to the dev list:
>
>
Thanks for contributing to the discussion Alex! Your points and experience
seem rather valuable.




> == Interactive Transactions
>
>
Heh, it seems we sent these almost concurrently :-) Thanks for contributing
this. I think for many readers debating concrete examples is easier, even
if we are talking about future opportunities that's not in scope for the
CEP. It helps to see a path forward.


We also had a bit of discussion over implementation constraints on the
> conflict checking.  Without supporting optimistic transactions, Accord only
> needs to keep track of the read/write sets of transactions which are still
> in flight.  To support optimistic transactions, Accord would need to
> bookkeep the most recent timestamp at which the key was modified, for every
> key.  There's some databases (e.g. CockroachDB, FoundationDB) which have a
> similar need, and use similar data structures which could be copied.
>
>
In the context of Cassandra, I had actually assumed the Accord timestamp
will be used as the cell timestamp for each value? Isn't something like
this needed for compaction to work correctly too?

Committing a transaction before execution means the database is committed
> to performing the deferred work of transaction execution.  In some fashion,
> the expressiveness and complexity of the query language needs to be
> constrained to place limitations on the execution time or resources. Fauna
> invented FQL with a specific set of limitations for a presumable reason.
> CQL seems to already be a reasonably limited query language that doesn't
> easily lend itself to succinctly expressing an incredulous amount of work,
> which would make it already reasonably suited as a query language for
> Accord.
>
>
Alternatively - in a future where the query language evolves to be more
complex - some backpressure mechanism seems necessary to throttle new
transactions while previously committed ones are still being applied. (For
those of you that started reading up on Galera from my previous email, see
"flow control")




> Any query which can't pre-declare its read and write sets must attempt to
> pre-execute enough of the query to determine them, and then submit the
> transaction as optimistic on all values read during the partial execution
> still being untouched.  Most notably, all workloads that utilize secondary
> indexes are affected, and degrade from being guaranteed to commit, to being
> optimistic and potentially requiring retries.  This transformed Calvin into
> an optimistic protocol, and one that's significantly less efficient than
> classic execute-then-commit designs.  Accord is similarly affected, though
> the window of optimism would likely be smaller.  However, it seems like
> most common ways to end up in this situation are already discouraged or
> prevented.  CQL's own restrictions prevent many forms of queries which
> result in unclear read and write sets.  In my highly limited Cassandra
> experience, I've generally seen Secondary Indexes be cautioned against
> already.
>
>
See CEP-7 which independently is proposing a new set of secondary indexes
that we hope to be usable.

Rather than needing to re-execute anything, in my head I had thought that
for Accord to support secondary indexes, the write set is extended to also
cover the secondary index keys read or modified. Essentially this is like
thinking of a secondary index as its own primary key. Mutations that change
indexed columns, would add both their PK to the write set, as well as the
secondary index keys it modified. A read query would then check its
dependencies against whatever indexes (PK, or secondary) it uses to execute
itself, and nothing more.

The above is saying that for a given snapshot/timestamp, the result of a
statement is equally well defined by the secondary index keys used as it is
by the primary keys returned from those secondary index keys.

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread Henrik Ingo
Thank you Alex for your feedback, I greatly value these thoughts and always
enjoy learning new details in this space.

On Wed, Oct 13, 2021 at 10:07 AM Alex Miller  wrote:

> These two pieces together seem to imply that your claim is that Read
> Committed may read whatever the most recently committed data during
> the execution of the statement and does not require MVCC.  Though I
> agree that the standard[1] is very unclear as to what a "read" means
> when defining a non-repeatable read:
>

I responded to Blake's similar comment on this topic. Out of respect for
his request to move the discussion to a newly created thread, I will not
elaborate here rather just reference my reply to Blake.

The following observation seems more relevant for Accord itself and the
discussion on trade-offs, so I'll allow myself to continue within this
thread:


>
> On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo 
> wrote:
> > Approach: The conversational part of the transaction is a sequence of
> > regular Cassandra reads and writes. Mutations are however executed as
> > read-queries toward the database nodes. Database state isn't modified
> > during the conversational phase, rather the primary keys of the
> > to-be-mutated rows are stored for later use. Accord is essentially the
> > commit phase of the transaction. All primary keys to be updated are the
> > write set of  the Accord transaction. There's no need to re-execute the
> > reads, so the read set is empty.
>
> As I've pondered this over time, I personally specifically fault
> read-your-uncommitted-writes as the reason why NewSQL databases are
> essentially a design monoculture.  Every database persists uncommitted
> writes in the database itself during execution.  Doing so encourages
> those writes to be re-used for concurrency control (ie. write
> intents), and then that places you in the exact client-driven 3PC
> protocol which Cockroach, TiDB, and YugaByte all implement.  Even if
> you find some radically different database for SQL, like leanXcale, it
> _still_ persists uncommitted writes into the database.
>
>
This is an interesting point of view... I simply assumed this approach is
inherited by the fact that single server RDBMS implementations are built
this way, and NewSQL solutions reuse well known designs for the database
engine.


> And every time I've thought through this, I tend to agree.  It's too
> exceedingly easy to write a SQL query which will exceed any local
> limit imposed by memory, and it's too easy to write a query which runs
> fine in production for a while, until it hits a memory limit and
> begins to immediately fail.  There's a tremendous implementation
> difference between `DELETE FROM Table` and `TRUNCATE Table`, and it's
> relatively hard to explain that to naive users.
>
> Memory constraints aside, merging a local write cache into remote data
> for execution seems like it'd be quite a task.   Any desire for
> efficient distributed query execution would push for a design where
> query fragments can be pushed down to the nodes holding the data.


Reading this I realize...

Aren't you actually pointing out a limitation in any "single shot"
transactional algorithm? Including Accord itself, without any interactive
part?

What you are saying is that an Accord transaction is limited by the need
for both the client, and coordinator, to be able to keep the entire
transaction in memory and process it?

Where Cassandra is coming from, I'm not particularly alarmed by this
limitation as I would expect operations on a Cassandra database to be fast
and small, but it's an important limitation to call out for sure. Indeed,
those who have been worried Accord will not be able to serve well all
possible future use cases may have found their first meaningful concrete
example to add to the list?

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [IDEA] Read committed transaction with Accord

2021-10-13 Thread Henrik Ingo
On Wed, Oct 13, 2021 at 3:38 PM bened...@apache.org 
wrote:

> It may have been lost in the back and forth (there’s been a lot of
> emails), but I outlined an approach for READ COMMITTED (and SERIALIZABLE)
> read isolation that does not require a WAN trip.


Yes, thank you. I knew I had read it but was looking for this in the
CEP/paper itself and couldn't find it when searching. Thanks for
re-explaining here.

However, for any multi-shard protocol it is unsafe to perform an
> uncoordinated read as each shard may have applied different transaction
> state – the timestamp you pick may be in the future on some shards.
>
>
Ok, fair enough.

Ah, I realize now we're talking past each other. What I had in mind I did
not even expect that different rows or partitions need to reflect the same
snapshot/timestamp. Just that each row independently reflects some state
that was committed. "Per row read committed", if you will. I realize now
that a) that requirement is probably trivially true for any read to any
node managed by Accord, and b) maybe it's not appropriate to call this read
committed after all. I agree that the spirit of read committed requires
that if 2 rows were modified by the same transaction, then a future read
including both rows, must show a consistent result of both the rows.


> For my purposes I'll just note that needing to re-execute all reads
> during the Accord phase (commit phase) would make the design more expensive
>
>
I realize now my response that this was part of incorporates the same
flawed thinking about "per row read commit" isolation. Thanks for
persisting in educating me.



> As noted by Alex, the only thing that needs to be corroborated in this
> approach is the timestamps. However, this is only necessary for
> SERIALIZABLE isolation or above. For READ COMMITTED it would be enough to
> perform reads with the above properties, and to buffer writes until a final
> Accord transaction round. However, as also noted by Alex, the difficulty
> here is read-your-writes. This is a general problem for interactive
> transactions and – like many of these properties - orthogonal to Accord. A
> simple approach would be to nominate a coordinator for the transaction that
> buffers writes and integrates them into the transaction execution. This
> might impose some restrictions on the size of the transaction we want to
> support, and of course means if the coordinator fails the transaction also
> fails.
>

But that is true for Cassandra today as well. (For a shorter time window,
anyway.)


>
> If we want to remove all restrictions, we are back at the monoculture of
> Cockroach, YugaByte et al. Which, again, may both be implemented by, and
> co-exist with, Accord.
>
> In this world, complex transactions would insert read and write intents
> using an Accord operation, along with a transaction state record. If
> transactions insert conflicting intents, the concurrency control
> arbitration mechanism decides what happens (whether one transaction blocks,
> aborts, or what have you). There is a bunch of literature on this that is
> orthogonal to the Accord discussion.
>
> In the case of READ COMMITTED, I believe this can be particularly simple –
> we don’t need any read intents, only write intents which are essentially a
> distributed write buffer. The storage system uses these to answer reads
> from the transaction that inserted the write intents, but they are ignored
> for all other transactions. In this case, there is no arbitration needed as
> there is never a need for one transaction to prevent another’s progress. A
> final Accord operation commits these write intents atomically across all
> shards, so that reads that occur after this operation integrate these
> writes, and those that executed before do not.
>
> Note that in this world, one-shot transactions may still execute without
> participating in the complex transaction system. In the READ COMMITTED
> world this is particularly simple, they may simply execute immediately
> using normal Accord operations. But it remains true even for SERIALIZABLE
> isolation, so long as there are no read or write intents to these keys. In
> this case we must validate there are no such intents, and if any are found
> we may need to upgrade to a complex transaction. This can be done
> atomically as part of the Accord operation, and then the transaction
> concurrency control arbitration mechanism kicks in.
>

And now I teased you to outline a different read committed transaction
approach :-)

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=h

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-13 Thread Henrik Ingo
On Wed, Oct 13, 2021 at 2:54 PM bened...@apache.org 
wrote:

> I think this is a blurring of lines of systems however. I _think_ the
> point Alex is making (correct me if I’m wrong) is that the transaction
> system will need to track the transaction timestamps that were witnessed by
> each read for each key, in order to verify that they remain valid on
> commit.


Isn't it sufficient to simply verify that there were no conflicting writes
between a start timestamp of the transaction and the commit timestamp?

I can imagine verifying the timestamp of each row or cell could result in
"finer grained" dependency checking and therefore cause less aborts due to
occ.

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Tradeoffs for Cassandra transaction management

2021-10-13 Thread Henrik Ingo
Sorry Jonathan, didn't see this reply earlier today.

That would be common behaviour for many MVCC databases, including MongoDB,
MySQL Galera Cluster, PostgreSQL...

https://www.postgresql.org/docs/9.5/transaction-iso.html

*"Applications using this level must be prepared to retry transactions due
to serialization failures."*

On Wed, Oct 13, 2021 at 3:19 AM Jonathan Ellis  wrote:

> Hi Henrik,
>
> I don't see how this resolves the fundamental problem that I outlined to
> start with, namely, that without having the entire logic of the transaction
> available to it, the server cannot retry the transaction when concurrent
> changes are found to have been applied after the reconnaissance reads (what
> you call the conversational phase).
>
> On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo 
> wrote:
>
> > Hi all
> >
> > I was expecting to stay out of the way while a vote on CEP-15 seemed
> > imminent. But discussing this tradeoffs thread with Jonathan, he
> encouraged
> > me to say these points in my own words, so here we are.
> >
> >
> > On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston
> >  wrote:
> >
> > > 1. Is it worth giving up local latencies to get full global
> consistency?
> > > Most LWT use cases use
> > > LOCAL_SERIAL.
> > >
> > > This isn’t a tradeoff that needs to be made. There’s nothing about
> Accord
> > > that prevents performing consensus in one DC and replicating the writes
> > to
> > > others. That’s not in scope for the initial work, but there’s no reason
> > it
> > > couldn’t be handled as a follow on if needed. I agree with Jeff that
> > > LOCAL_SERIAL and LWTs are not usually done with a full understanding of
> > the
> > > implications, but there are some valid use cases. For instance, you can
> > > enable an OLAP service to operate against another DC without impacting
> > the
> > > primary, assuming the service can tolerate inconsistency for data
> written
> > > since the last repair, and there are some others.
> > >
> > >
> > Let's start with the stated goal that CEP-15 is intended to be a better
> > version of LWT.
> >
> > Reading all the discussion, I feel like addressing the LOCAL_SERIAL /
> > LOCAL_QUORUM use case is the one thing where Accord isn't strictly an
> > improvement over LWT. I don't agree that Accord will just be so much
> faster
> > anyway, that it would compensate a single network roundtrip around the
> > world. Four LWT round-trips with LOCAL_SERIAL will still only be on the
> > order of 10 ms, but global latencies for just a single round trip are
> > hundreds of ms.
> >
> > So, my suggestion to resolve this discussion would be that "local quorum
> > latency experience" should be included in CEP-15 to meet its stated goal.
> > If I have understood the CEP process correctly, this merely means that we
> > agree this is a valid and significant use case in the Cassandra
> ecosystem.
> > It doesn't mean that everything in the CEP must be released in a single
> v1
> > release. At least personally I don't necessarily need to see a very
> > detailed design for the implementation. But I'm optimistic it would
> resolve
> > one open discussion if it was codified in the CEP that this is a use case
> > that needs to be addressed.
> >
> >
> > > 2. Is it worth giving up the possibility of SQL support, to get the
> > > benefits of deterministic transaction design?
> > >
> > > This is a false dilemma. Today, we’re proposing a deterministic
> > > transaction design that addresses some very common user pain points.
> SQL
> > > addresses different user pain point. If someone wants to add an sql
> > > implementation in the future they can a) build it on top of accord b)
> > > extend or improve accord or c) implement a separate system. The right
> > > choice will depend on their goals, but accord won’t prevent work on it,
> > the
> > > same way the original lwt design isn’t preventing work on
> multi-partition
> > > transactions. In the worst case, if the goals of a hypothetical sql
> > project
> > > are different enough to make them incompatible with accord, I don’t see
> > any
> > > reason why we couldn’t have 2 separate consensus systems, so long as
> > people
> > > are willing to maintain them and the use cases and available
> technologies
> > > justify it.
> > >
> >
> >
> >
> > The part of the discussion that's hard to deal wi

Re: [IDEA] Read committed transaction with Accord

2021-10-14 Thread Henrik Ingo
Thanks for clarifying Jonathan. I agree with your example.

It seems we have now moved into discussing specific requirements/semantics
for an interactive transaction implementation. Which is interesting, but
beyond what I will have time to think about tonight. At least off the top
of my head I can't say I have any data or experience to say how important
it is to satisfy the use case you are outlining.

As a gut feeling, I believe the alternative proposal outlined by Alex and
Benedict would take such locks in the database nodes that you describe. But
again, we'll have to return to this another day as my today is almost over.

henrik

On Thu, Oct 14, 2021 at 7:39 PM Jonathan Ellis  wrote:

> ... which is a long way of saying, in postgresql those errors are there as
> part of checking for correctness -- when you see them it means you did not
> ask for the appropriate locks.  It's not expected that you should write
> try/catch/retry loops to work around this.
>
> On Thu, Oct 14, 2021 at 11:13 AM Jonathan Ellis  wrote:
>
> > [Moving followup here from the other thread]
> >
> > I think there is in fact a difference here.
> >
> > Consider a workload consisting of two clients.  One of them is submitting
> > a stream of TPC-C new order transactions (new order client = NOC), and
> the
> > other is performing a simple increment of district next order ids
> > (increment district client = IDC).
> >
> > If we run these two workloads in postgresql under READ COMMITTED, both
> > clients will proceed happily (although we will get serialization
> anomalies).
> >
> > If we run them in pg under SERIALIZABLE, then the NOC client will get the
> > "could not serialize access" error whenever the IDC client updates the
> > district concurrently, which will be effectively every time since the IDC
> > transaction is much simpler.  But, SQL gives you a tool to allow NOC to
> > make progress, which is SELECT FOR UPDATE.  If the NOC performs its first
> > read with FOR UPDATE then it will (1) block until the current IDC
> > transaction completes and then (2) grab a lock that prevents further
> > updates from happening concurrently, allowing NOC to make progress.
> > Neither NOC nor IDC will ever get a "could not serialize access" error.
> >
> > It looks to me like the proposed design here would (1) not allow NOC to
> > make progress at READ COMMITTED, but also (2) does not provide the tools
> to
> > achieve progress with SERIALIZABLE either since locking outside of the
> > global consensus does not make sense.
> >
> > On Wed, Oct 13, 2021 at 1:59 PM Henrik Ingo 
> > wrote:
> >
> >> Sorry Jonathan, didn't see this reply earlier today.
> >>
> >> That would be common behaviour for many MVCC databases, including
> MongoDB,
> >> MySQL Galera Cluster, PostgreSQL...
> >>
> >>
> https://urldefense.com/v3/__https://www.postgresql.org/docs/9.5/transaction-iso.html__;!!PbtH5S7Ebw!KP0b2eRHpf-D6w1012nea4UbnsxtFn-zUEBrAZ7ghBFDr_QQyTT6qHzgZ0KKUKxt_64$
> >>
> >> *"Applications using this level must be prepared to retry transactions
> due
> >> to serialization failures."*
> >>
> >> On Wed, Oct 13, 2021 at 3:19 AM Jonathan Ellis 
> wrote:
> >>
> >> > Hi Henrik,
> >> >
> >> > I don't see how this resolves the fundamental problem that I outlined
> to
> >> > start with, namely, that without having the entire logic of the
> >> transaction
> >> > available to it, the server cannot retry the transaction when
> concurrent
> >> > changes are found to have been applied after the reconnaissance reads
> >> (what
> >> > you call the conversational phase).
> >
> >
> > On Wed, Oct 13, 2021 at 5:00 AM Henrik Ingo 
> > wrote:
> >
> >> On Wed, Oct 13, 2021 at 1:26 AM Blake Eggleston
> >>  wrote:
> >>
> >> > Hi Henrik,
> >> >
> >> > I would agree that the local serial experience for valid use cases
> >> should
> >> > be supported in some form before legacy LWT is replaced by Accord.
> >> >
> >> >
> >> Great! It seems there's a seed of consensus on this point.
> >>
> >>
> >> > Regarding your read committed proposal, I think this CEP discussion
> has
> >> > already spent too much time talking about hypothetical SQL
> >> implementations,
> >> > and I’d like to avoid veering off course again. However, since you’ve
> >> asked
> >> > a well thought out qu

Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Henrik Ingo
On Fri, Oct 15, 2021 at 3:37 AM Dinesh Joshi 
wrote:

> On 10/14/21 6:54 AM, Jonathan Ellis wrote:
>
> > I think I've also been clear that I want a path to supporting (1) local
> > latencies (SLOG is a more elegant solution but "let's just let people
> give
> > up global serializability like LWT" is also reasonable) and (2) SQL with
> > interactive transactions.
>
>
> 99% of the transactions in a system will not be performed as interactive
> SQL transactions by a human. We should be optimizing for the 99%.
>
>
"Interactive" here does not mean that it's a human typing the queries. It
rather means that there are more than one round trips between the client
and server.

Any application doing:

BEGIN
x = SELECT x FROM ...
if x == 5:
UPDATE t SET y=6
COMMIT

...would be an interactive transaction. And this is traditionally the
common case, even if recent NewSQL and NoSQL databases have introduced some
intriguing outside of the box thinking in this area.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Tradeoffs for Cassandra transaction management

2021-10-15 Thread Henrik Ingo
On Fri, Oct 15, 2021 at 5:54 PM Dinesh Joshi 
wrote:

> Thank you for clarifying the terminology. I haven’t honestly heard anybody
> call these as interactive transactions. Therefore it is very crucial that
> we lay out things systematically so everyone is on the same page. You’re
> talking about bundling several statements into a single SQL transaction
> block.
>
>
Well, it's more complicated than that. Systems like Calvin and VoltDB have
introduced concepts where you can bundle several statements into a  single
transaction block, but that block is executed server side, and it's not
possible to have any additional roundtrips to the client. So the use of
"interactive transactions" is supposed to distinguish from those. But
you're right I may have invented the word. Historically such transactions
were the norm so no additional qualifier has been needed.

henrik


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-18: Improving Modularity

2021-10-25 Thread Henrik Ingo
Hi Benedict

This CEP is a bundle of APIs arising out of our recent work to re-architect
Cassandra into a more cloud native architecture. What our product marketing
has chosen to call "Serverless" is a variant of Cassandra where we have
separated compute from storage (coordinator vs data node), used S3-like
storage, and made various improvements to better support multi-tenancy in a
single Cassandra (Serverless) cluster. This whitepaper [1] explains this
work in detail for those of you interested to learn more. (Apologies that
it requires registration and the first page may at times sound a bit
marketingy, but it's really the most detailed report we have published so
far.)

[1] https://www.datastax.com/resources/whitepaper/astra-serverless

The above work was implemented in a way where by default a user can
continue to run Cassandra in the familiar "classic" way. The APIs
introduced by CEP-18 on the other hand allow alternate or additional
functionality to be provided, which in our case we have used to create a
"serverless" way of deploying a Cassandra cluster.

The logic behind proposing this bundle of APIs separately, is roughly for
these reasons:

The APIs touch existing code and functionality, so to minimize risk to the
next Cassandra release, it would make sense to try to complete merging this
work as early as possible in the development cycle. For the same reason,
keeping the new implementations out of this CEP allows us to focus review -
both of the CEP, and the eventual pull requests - on the APIs themselves,
whereas the related implementations (or plug-ins) would add to the scope
quite significantly. On the other hand non-default plugin functionality can
be added later with much lower risk.

Second, while it's completely fair to ask for context, why was this
particular refactoring or API done in the first place, the assumption for a
CEP like this one is that better defined interfaces, that are better
documented and come with better test coverage than existing code, should be
enough legs to stand on in itself. Also, in the best case a good API will
also enable other implementations than the one we had in mind when
developing the API, so we wouldn't want to tie the discussion too much into
the implementation that happened to be the first. (As an example of this
working out nicely, your own work in CASSANDRA-16926 was for you motivated
by enabling a new kind of testing, but it also just so happens it is the
same work that enables someone to implement remote file storage, which we
therefore could drop from this CEP-18.)

Conversely also, it was our expectation when proposing this CEP that
"better modularity" at least on a high level should be a fairly
straightforward conversation, while the actual plugins that make up our
"serverless" new architecture may reasonably ignite much more debate, or at
least questions as to how they work. As we have a backlog of several fairly
substantial CEPs lined up, we are trying to be very mindful of the
bandwidth of the developers on this list. For example, last week Jacek also
proposed CEP-17 for discussion. So we are trying to focus the discussion on
what's in CEP-17 and CEP-18 for now. (In addition I remember at least 2
CEPs that were discussed but not yet voted on. I don't know if this adds to
cognitive load for anyone else than myself.)

henrik

On Mon, Oct 25, 2021 at 12:39 PM bened...@apache.org 
wrote:

> Hi Jeremiah,
>
> My personal view is that work to modularise the codebase should be tied to
> specific use cases. If improved testing is the purpose of this work, I
> think it would help to include those improved tests that you plan to
> support as goals for the CEP.
>
> If on the other hand some of this work is primarily intended to enable
> certain features, I personally think it would be preferable to tie them to
> those features - perhaps with their own CEP?
>
>
> From: Jeremiah Jordan 
> Date: Friday, 22 October 2021 at 16:24
> To: Cassandra DEV 
> Subject: [DISCUSS] CEP-18: Improving Modularity
> Hi All,
> As has been seen with the work already started in CEP-10, increasing the
> modularity of our subsystems can improve their testability, and also the
> ability to try new implementations without breaking things.
>
> Our team has been working on doing this and CEP-18 has been created to
> propose adding more modularity to a few different subsystems.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-18%3A+Improving+Modularity
>
> CASSANDRA-17044 has already been created for Schema Storage changes related
> to this work and more JIRAs and PRs are to follow for the other subsystems
> proposed in the CEP.
>
> Thanks,
> -Jeremiah Jordan
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on

Re: [DISCUSS] Creating a new slack channel for newcomers

2021-11-12 Thread Henrik Ingo
Btw, it would be possible to have a welcome-bot on Slack. It would
recognize new users joining the channel, and send them some friendly
message. Part of the message could be the list of friendly volunteers
available to help newcomers.

On Fri, Nov 12, 2021 at 6:57 PM Benjamin Lerer  wrote:

> Thanks a lot everybody for the feedback.
>
> It sounds that most of you are not convinced by the slack channel but
> believe that a list of newcomer mentors will be more efficient.
> That sounds like a good approach to me.
>
> I will start a new discussion to collect the list of volunteers.
>
> Le mar. 9 nov. 2021 à 17:12, Joseph Lynch  a écrit
> :
>
> > I also feel that having all the resources to get help in more or less
> > one place (#cassandra-dev slack / ML) probably helps newcomers on the
> > whole since they can ask questions and likely engage with someone who
> > can help. I know that I've asked a few silly questions in
> > #cassandra-dev and appreciated that there were more experienced
> > project members to help answer them.
> >
> > If we wanted to have a set of designated "newcomer mentors" or some
> > such that seems useful in addition. Perhaps their email/handles on the
> > website in the contributing section with an encouragement to ask them
> > first if you're unsure who to ask?
> >
> > -Joey
> >
> > On Tue, Nov 9, 2021 at 10:16 AM Sumanth Pasupuleti
> >  wrote:
> > >
> > > +1 that existing channels of communication (cassandra-dev slack and
> > mailing
> > > lists) should ideally suffice, and I have not seen prohibitive
> > > communication in those forums thus far that goes against newcomers. I
> > agree
> > > it can be intimidating, but to Bowen's point, the more traffic we see
> > > around newcomers in those forums, the more comfortable it gets.
> > > I agree starting a new channel is a low effort experiment we can do,
> but
> > > the success depends on finding mentors and the engagement of mentors
> vs I
> > > believe engagement in #cassandra-dev is almost guaranteed given the
> high
> > > number of people in the channel.
> > >
> > > Thanks,
> > > Sumanth
> > >
> > > On Tue, Nov 9, 2021 at 6:47 AM Bowen Song 
> wrote:
> > >
> > > > As a newcomer (made two commits since October) who has been watching
> > > > this mailing list since then, I don't like the idea of a separate
> > > > channel for beginner questions. The volume in this mailing list is
> > > > fairly low, I can't see any legitimate reason for diverting a portion
> > of
> > > > that into another channel, further reducing the volume in the
> existing
> > > > channel and perhaps not creating much volume in the new channel
> either.
> > > >
> > > > Personally, I think a clearly written and easy to find community
> > > > guideline highlighting that this mailing list is suitable for
> beginner
> > > > questions, and give some suggestions/recommendations on when, where
> and
> > > > how to ask beginner questions would be more useful.
> > > >
> > > > At the moment because the volume of beginner questions is very very
> low
> > > > in this mailing list, newcomers like me don't feel comfortable asking
> > > > questions here. That's not because there's 600 pair of eyes watching
> > > > this (TBH, if you didn't mention it, I wouldn't have noticed it), but
> > > > because the herd mentality. If not many questions are asked here,
> most
> > > > people won't start doing that. It's all about creating the
> environment
> > > > that makes people feel comfortable asking questions here.
> > > >
> > > > On 08/11/2021 16:28, Benjamin Lerer wrote:
> > > > > Hi everybody,
> > > > >
> > > > > Aleksei Zotov mentioned to me that it was a bit intimidating for
> > > > newcomers
> > > > > to ask beginner questions in the cassandra-dev channel as it has
> > over 600
> > > > > followers and that we should probably have a specific channel for
> > > > > newcomers.
> > > > > This proposal makes total sense to me.
> > > > >
> > > > > What is your opinion on this? Do you have any concerns about it?
> > > > >
> > > > > Benjamin
> > > > >
> > > >
> > > > 

Re: [DISCUSS] Releasable trunk and quality

2021-11-17 Thread Henrik Ingo
lumes of work coming down the pike with CEP's, this seems
> like a good time to at least check in on this topic as a community.
>
> Full disclosure: running face-first into 60+ failing tests on trunk when
> going through the commit process for denylisting this week brought this
> topic back up for me (reminds me of when I went to merge CDC back in 3.6
> and those test failures riled me up... I sense a pattern ;))
>
> Looking forward to hearing what people think.
>
> ~Josh
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Henrik Ingo
Grepping is an important use case, and having worked with another database
that does nest its configs, I can offer some tips how I survived:

With good old grep, it can help to use the before and after options:

grep -A 5 track_warnings | grep -B 5 warn_threshold

Would find you this:

track_warnings:
enabled: true
coordinator_read_size:
warn_threshold: 10kb

It would require magic expert knowledge to guess right numbers for -A and
-B but in many cases you could just use a large number like  and it
will work in most cases.

For more frequent use, you will want to just install `yq` (aka yaml query):
https://github.com/kislyuk/yq

henrik


On Fri, Nov 19, 2021 at 9:07 PM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> Hi David,
>
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
>
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
>
> Thanks
>
> On Fri, 19 Nov 2021 at 19:36, David Capwell  wrote:
> >
> > This has been brought up in a few tickets, so pushing to the dev list.
> >
> > CASSANDRA-15234 - Standardise config and JVM parameters
> > CASSANDRA-16896 - hard/soft limits for queries
> > CASSANDRA-17147 - Guardrails prototype
> >
> > In short, do we as a project wish to move "new features" into nested
> > YAML when the feature has "enough" to justify the nesting?  I would
> > really like to focus this discussion on new features rather than
> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > already a place to talk about that.
> >
> > To get things started, let's start with the track-warning feature
> > (hard/soft limits for queries), currently the configs look as follows
> > (assuming 15234)
> >
> > track_warnings:
> > enabled: true
> > coordinator_read_size:
> > warn_threshold: 10kb
> > abort_threshold: 1mb
> > local_read_size:
> > warn_threshold: 10kb
> > abort_threshold: 1mb
> > row_index_size:
> > warn_threshold: 100mb
> > abort_threshold: 1gb
> >
> > or should this be "flat"
> >
> > track_warnings_enabled: true
> > track_warnings_coordinator_read_size_warn_threshold: 10kb
> > track_warnings_coordinator_read_size_abort_threshold: 1mb
> > track_warnings_local_read_size_warn_threshold: 10kb
> > track_warnings_local_read_size_abort_threshold: 1mb
> > track_warnings_row_index_size_warn_threshold: 100mb
> > track_warnings_row_index_size_abort_threshold: 1gb
> >
> > For me I prefer nested for a few reasons
> > * easier to enforce consistency as the configs can use shared types;
> > in the track warnings patch I had mismatches cross configs (warn vs
> > warns, fail vs abort, etc.) before going nested, now everything reuses
> > the same types
> > * even though it is longer, things can be more clear how they are related
> > * parsing layer can add support for mixed or purely flat depending on
> > user preference (example:
> > track_warnings.row_index_size.abort_threshold, using the '.' notation
> > to represent nested structures)
> >
> > Thoughts?
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Paxos repairs in CEP-14

2021-12-04 Thread Henrik Ingo
Could someone elaborate on this section



*Paxos Repair*
We will introduce a new repair mechanism, that can be run with or without
regular repair. This mechanism will:

   - Track, per-replica, transactions that have been witnessed as initiated
   but have not been seen to complete
   - For a majority of replicas complete (either by invalidating,
   completing, or witnessing something newer) all operations they have
   witnessed as incomplete prior to the intiation of repair
   - Globally invalidate all promises issued prior to the most recent paxos
   repair



Specific questions:

Assuming a table only using these LWT:s

* As the repair is only guaranteed for a majority of replicas, I assume I
can discover somewhere which replicas are up to date like this?

* Do I understand correctly, that if I take a backup from such a replica,
it is guaranteed to contain the full state up to a certain timestamp t?
(And in addition may or may not contain mutations higher than t, which of
course could overwrite the value the same key had at t.)

* Does the replica also end up with a complete and continuous log of all
writes until t? If not, does a merge of all logs in the majority contain a
complete log? In particular, I'm trying to parse the significance of "or
witnessing something newer"? (Use case for this last question could be
point in time restore, aka continuous backup, or also streaming writes to a
downstream system.)

henrik
-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Paxos repairs in CEP-14

2021-12-05 Thread Henrik Ingo
On Sun, 5 Dec 2021, 1.45 bened...@apache.org,  wrote:

> > As the repair is only guaranteed for a majority of replicas, I assume I
> can discover somewhere which replicas are up to date like this?
>
> I’m not quite sure what you mean. Do you mean which nodes have
> participated in a paxos repair? This information isn’t maintained, but
> anyway would not imply the node is up to date. A node participating in a
> paxos repair ensures _a majority of other nodes_ are up-to-date with _its_
> knowledge, give or take.


Ah, thanks for clarifying. Indeed I was assuming the paxos repair happens
the opposite way.


By performing this on a majority of nodes, we ensure a majority of replicas
> has a lower bound on the knowledge of a majority, and we effectively
> invalidate any in-progress operations on any minority that did not
> participate.


And at the end of the repair, this lower bound is known and stored
somewhere?


> > Do I understand correctly, that if I take a backup from such a replica,
> it is guaranteed to contain the full state up to a certain timestamp t?
>
> No, you would need to also perform regular repair afterwards. If you
> perform a regular repair, by default it will now be preceded by a paxos
> repair (which is typically very quick), so this will in fact hold, but
> paxos repair won’t enforce it.


Ok, so I'm trying to understand this...

At the end of a Paxos repair, it is guaranteed that each LWT transaction
has arrived at a majority of replicas. However, it's still not guaranteed
that any single node would contain all transactions, because it could have
been in a minority partition for some transactions. Correct so far?

Under good conditions, I assume the result of a paxos repair is that all
nodes received all LWT transactions from all other replicas? If some node
is unavailable, that same node will be missing a bunch of transactions that
it didn't receive repairs for?


I'm thinking through this as I type, but I guess where I'm going is: in the
universe of possible future work, does there exist a not-too-complex
modification to CEP-14 where:

1. Node 1 concludes that a majority of its replicas appear to be available,
and does its best to send all of its repairs to all of the replicas in that
majority set.

2. Node 2 is able to learn that Node 1 successfully sent all of its repair
writes to this set, and makes an attempt to do the same. If there are
replicas in the set that it can't reach, they can be subtracted from the
set, but the set still needs to contain a majority of replicas in the end.

3. At the end of all nodes doing the above, we would be left with a
majority set of nodes that are known to - each individually - contain all
LWT transactions up to the timestamp t.

4. A benefit of 3: A node N is not in the above majority set. It can now
repair itself by communicating with a single node from the majority set,
and copy its transaction log up to timestamp t. After doing so, it can join
the majority set, as it now contains all transactions up to t.

5. For a longer outage it may not be possible for node N to ever catch up
by replaying a serial transaction log. (Including for the reason an old
enough log may no longer be available.) In this case traditional streaming
repair would still be used.

Based on your first reply, I guess none of the above is strictly needed to
achieve the use case I outlined (backup, point in time restore,
streaming...). It seems I'm attracted by the potential for simplicity of a
setup where traditional repair is only needed as a fallback option.
(Ultimately it's needed to bootstrap empty nodes anyway, so it wouldn't go
away.)





> > Does the replica also end up with a complete and continuous log of all
> writes until t? If not, does a merge of all logs in the majority contain a
> complete log?
>
> A majority. There is also no log that gets replicated for LWTs in
> Cassandra. There is only ever at most one transaction that is in flight
> (and that may complete) and whose result has not been persisted to some
> majority, for any key. Paxos repair + repair means the result of the
> implied log are replicated to all participants.


I understand that Cassandra's LWT replication isn't based on replicating a
single log. However I'm interested to understand whether it would be
possible to end up with such a log as an outcome of the Paxos
replication/repair process, since such a log can have other uses.

Even with all of the above, I'm still left wondering: does the repair
process (with the above modification, say) result in a node having all
writes that happened before t, or is it only guaranteed to have the most
recent value for each primary key?


Henrik

>
> From: Henrik Ingo 
> Date: Saturday, 4 December 2021 at 23:12
> To: dev@cassandra.apache.org 
> Subject: Paxos repairs in CEP-14
> Cou

Re: Paxos repairs in CEP-14

2021-12-05 Thread Henrik Ingo
On Sun, 5 Dec 2021, 18.40 bened...@apache.org,  wrote:

> > And at the end of the repair, this lower bound is known and stored
> somewhere?
>
> Yes, there is a new system.paxos_repair_history table
>
> > Under good conditions, I assume the result of a paxos repair is that all
> nodes received all LWT transactions from all other replicas?
>
> All in progress LWTs are flushed, essentially. They are either completed
> or invalidated. So there is a synchronisation point for the range being
> repaired, but there is no impact on any completed transactions. So even if
> paxos repair successfully sync’d all in progress transactions to every
> node, there could still be some past transactions that were persisted only
> to a majority of nodes, and these will be invisible to the paxos repair
> mechanism.


Cool. This clarifies.


There is no transaction log today in Cassandra to sync, so repair of the
> underlying data table is still the only way to guarantee data is
> synchronised to every node.
>

It's not the transaction log as such that I'm missing. (Or it is, but I
understand there isn't one.) What is hard to wrap my head around is how a
given partition can participate in a successful Paxos transaction even if
it might be completely unaware of the previous transaction to the same
partition. At least this is how I've understood this conversation?


> CEP-15 will change this, so that nodes will be fully consistent up to some
> logical timestamp, but CEP-14 does not change the underlying semantics of
> LWTs and Paxos in Cassandra.
>

Yes, looking forward to that. I just wanted to check whether CEP-14 would
possibly contain aome per partition version of the same ideas.

But even with everything you've explained, did I understand correctly that
(focusing on a single partition and only LWT writes...) I can in any event
stream commit logs from a majority of replicas, merge them, and such a
merged log must contain all committed transactions to that partition. (And
this should have nothing to do with the repair, then?)

Henrik



>
>
>
>
> From: Henrik Ingo 
> Date: Sunday, 5 December 2021 at 11:45
> To: dev@cassandra.apache.org 
> Subject: Re: Paxos repairs in CEP-14
> On Sun, 5 Dec 2021, 1.45 bened...@apache.org,  wrote:
>
> > > As the repair is only guaranteed for a majority of replicas, I assume I
> > can discover somewhere which replicas are up to date like this?
> >
> > I’m not quite sure what you mean. Do you mean which nodes have
> > participated in a paxos repair? This information isn’t maintained, but
> > anyway would not imply the node is up to date. A node participating in a
> > paxos repair ensures _a majority of other nodes_ are up-to-date with
> _its_
> > knowledge, give or take.
>
>
> Ah, thanks for clarifying. Indeed I was assuming the paxos repair happens
> the opposite way.
>
>
> By performing this on a majority of nodes, we ensure a majority of replicas
> > has a lower bound on the knowledge of a majority, and we effectively
> > invalidate any in-progress operations on any minority that did not
> > participate.
>
>
> And at the end of the repair, this lower bound is known and stored
> somewhere?
>
>
> > > Do I understand correctly, that if I take a backup from such a replica,
> > it is guaranteed to contain the full state up to a certain timestamp t?
> >
> > No, you would need to also perform regular repair afterwards. If you
> > perform a regular repair, by default it will now be preceded by a paxos
> > repair (which is typically very quick), so this will in fact hold, but
> > paxos repair won’t enforce it.
>
>
> Ok, so I'm trying to understand this...
>
> At the end of a Paxos repair, it is guaranteed that each LWT transaction
> has arrived at a majority of replicas. However, it's still not guaranteed
> that any single node would contain all transactions, because it could have
> been in a minority partition for some transactions. Correct so far?
>
> Under good conditions, I assume the result of a paxos repair is that all
> nodes received all LWT transactions from all other replicas? If some node
> is unavailable, that same node will be missing a bunch of transactions that
> it didn't receive repairs for?
>
>
> I'm thinking through this as I type, but I guess where I'm going is: in the
> universe of possible future work, does there exist a not-too-complex
> modification to CEP-14 where:
>
> 1. Node 1 concludes that a majority of its replicas appear to be available,
> and does its best to send all of its repairs to all of the replicas in that
> majority set.
>
> 2. Node 2 is able to learn that Node 1 successfully sent all of it

Re: [DISCUSS] Releasable trunk and quality

2021-12-21 Thread Henrik Ingo
FWIW, I thought I could link to an example MongoDB commit:

https://github.com/mongodb/mongo/commit/dec388494b652488259072cf61fd987af3fa8470

* Fixes start from trunk or whatever is the highest version that includes
the bug
* It is then cherry picked to each stable version that needs to fix. Above
link is an example of such a cherry pick. The original sha is referenced in
the commit message.
* I found that it makes sense to always cherry pick from the immediate
higher version, since if you had to make some changes to the previous
commit, they probably need to be in the next one as well.
* There are no merge commits. Everything is always cherry picked or rebased
to the top of a branch.
* Since this was mentioned, MongoDB indeed tracks the cherry picking
process explicitly: The original SERVER ticket is closed when fix is
committed to trunk branch. However, new BACKPORT tickets are created and
linked to the SERVER ticket, one per stable version that will need a
cherry-pick. This way backporting the fix is never forgotten, as the team
can just track open BACKPRT tickets and work on them to close them.

henrik

On Tue, Dec 14, 2021 at 8:53 PM Joshua McKenzie 
wrote:

> >
> > I like a change originating from just one commit, and having tracking
> > visible across the branches. This gives you immediate information about
> > where and how the change was applied without having to go to the jira
> > ticket (and relying on it being accurate)
>
> I have the exact opposite experience right now (though this may be a
> shortcoming of my env / workflow). When I'm showing annotations in intellij
> and I see walls of merge commits as commit messages and have to bounce over
> to a terminal or open the git panel to figure out what actual commit on a
> different branch contains the minimal commit message pointing to the JIRA
> to go to the PR and actually finally find out _why_ we did a thing, then
> dig around to see if we changed the impl inside a merge commit SHA from the
> original base impl...
>
> Well, that is not my favorite.  :D
>
> All ears on if there's a cleaner way to do the archaeology here.
>
>
> On Tue, Dec 14, 2021 at 1:34 PM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
> > Does somebody else use the git workflow we do as of now in Apache
> > universe? Are not we quite unique? While I do share the same opinion
> > Mick has in his last response, I also see the disadvantage in having
> > the commit history polluted by merges. I am genuinely curious if there
> > is any other Apache project out there doing things same we do (or did
> > in the past) and who changed that in one way or the other, plus
> > reasons behind it.
> >
> > On Tue, 14 Dec 2021 at 19:27, Mick Semb Wever  wrote:
> > >
> > > >
> > > >
> > > > >   Merge commits aren’t that useful
> > > > >
> > > > I keep coming back to this. Arguably the only benefit they offer now
> is
> > > > procedurally forcing us to not miss a bugfix on a branch, but given
> how
> > > > much we amend many things presently anyway that dilutes that benefit.
> > > >
> > >
> > >
> > > Doesn't this come down to how you read git history, and for example
> > > appreciating a change-centric view over branch isolated development?
> > > I like a change originating from just one commit, and having tracking
> > > visible across the branches. This gives you immediate information about
> > > where and how the change was applied without having to go to the jira
> > > ticket (and relying on it being accurate). Connecting commits on
> > different
> > > branches that are developed separately (no merge tracking) is more
> > > complicated. So yeah, I see value in those merge commits. I'm not
> against
> > > trying something new, just would appreciate a bit more exposure to it
> > > before making a project wide change. Hence, let's not rush it and just
> > > start first with trunk.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] Periodic snapshot publishing with minor version bumps

2021-12-21 Thread Henrik Ingo
Just some observations from the proposal and reading the thread. I'm not
arguing for any one in particular.

1) Always increase first digit for real releases

The potential for confusion (which versions are stable releases?) can be
avoided by following Mick's proposal + always increasing the first number
for actual major releases. (Maybe this isn't exactly SemVer though? But it
seems it would work for what the project wants to do.) Examples:

4.0.0 (major release), 4.0.1, 4.0.2... (bug fixes only). 4.1.0, 4.2.0,
4.3.0... (quarterly snapshots), 5.0.0 (next major release)

Note that this would also leave an opportunity for a 4.1.1, should someone
wish to release a fix for one of the snapshot releases.

2) Use alpha for snapshots

The project could choose to use 4.1.0-alpha1, 4.1.0-alpha2, 4.1.0-alpha3
for the snapshots. Note that this doesn't prevent from also releasing the
traditional alpha releases prior to the major release, but those would then
be alpha3, alpha4...


henrik









On Thu, Dec 16, 2021 at 5:04 PM Mick Semb Wever  wrote:

> Back in January¹ we agreed to do periodic snapshot publishing, as we
> move to yearly major+minor releases. But (it's come to light²) it
> wasn't clear how we would do that.
>
> ¹) https://lists.apache.org/thread/vzx10600o23mrp9t2k55gofmsxwtng8v
> ²)
> https://urldefense.com/v3/__https://the-asf.slack.com/archives/CK23JSY2K/p1638950961325900__;!!PbtH5S7Ebw!YLIo_rYNkoRt7nBd3auSAet3mv-3eCpKn1ydsdWoCDswps68GFzapG7nniNJgB4YvVvE11i0B5_r-w$
>
>
> The following is a proposal on doing such snapshot publishing by
> bumping the minor version number.
>
> The idea is to every ~quarter in trunk bump the minor version in
> build.xml. No release or branch would be cut. But the last SHA on the
> previous snapshot version can be git tagged. It does not need to
> happen every quarter, we can make that call as we go depending on how
> much has landed in trunk.
>
> The idea of this approach is that it provides a structured way to
> reference these periodic published snapshots. That is, the semantic
> versioning that our own releases abide by extends to these periodic
> snapshots. This can be helpful as the codebase (and drivers) does not
> like funky versions (like putting in pre-release or vendor labels), as
> we want to be inclusive to the ecosystem.
>
> A negative reaction of this approach is that our released versions
> will jump minor versions. For example, next year's release could be
> 4.3.0 and users might ask what happened to 4.1 and 4.2. This should
> only be a cosmetic concern, and general feedback seems to be that
> users don't care so long as version numbers are going up, and that we
> use semantic versioning so that major version increments mean
> something (we would never jump a major version).
>
> A valid question is how would this impact our supported upgrade paths.
> Per semantic versioning any 4.x to 4.y (where y > x) is always safe,
> and any major upgrade like 4.z to 5.x is safe (where z is the last
> 4.minor). Nonetheless we should document this to make it clear and
> explicit how it works (and that we are adhering to semver).
>
> https://urldefense.com/v3/__https://semver.org/__;!!PbtH5S7Ebw!YLIo_rYNkoRt7nBd3auSAet3mv-3eCpKn1ydsdWoCDswps68GFzapG7nniNJgB4YvVvE11idIYkMUw$
>
> What are people's thoughts on this?
> Are there objections to bumping trunk so that base.version=4.2 ? (We
> can try this trunk and re-evaluate after our next annual release.)
>
> -----
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] Releasable trunk and quality

2022-01-06 Thread Henrik Ingo
On Wed, Jan 5, 2022 at 10:38 PM Mick Semb Wever  wrote:

> +(?) Is the devil we know
>>
>
> + Bi-directional relationship between patches showing which branches it
> was applied to (and how).  From the original commit or any of the merge
> commits I can see which branches, and where the original commit, was
> applied. (See the mongo example from Henrik, how do I see which other
> branches the trunk commit was committed to? do i have to start text
> searching the git history or going through the ticket system :-(
>

Just to answer the question, I'm obviously not that impacted by how
Cassandra commits happen myself...

Maybe a thing I wouldn't copy from MongoDB is that their commit messages
are often one liners and yes you need to look up the jira ticket to read
what the commit does. A feature of this approach is that the jira ticket
can be edited later. But personally I always hated that the commit itself
didn't contain a description.

Other than that, yes I would grep or otherwise search for the ticket id or
githash, which must be included in the cherry picked commit. It never
occurred to me someone wouldn't like this. Note that the stable branches
eventually only get a handful of patches per month, so even just reading
the git log gives a nice overview. If anything the "merge branch X into Y"
always confuse me as I have no idea what that does.

As for verifying that each branch was patched, if you adhere to committing
first to trunk, then in descending order to each stable branch, you can
just check the oldest branch to verify the chain. Not everyone followed
this style, but I would always append the cherry pick message, so that the
last commit (to the oldest stable branch) would contain the chain of
githashes all the way to trunk.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Google Summer of Code 2022 - Kick-off

2022-01-16 Thread Henrik Ingo
ce-to-have" subtasks which are not a
>>> strict
>>> >> requirement of the CEP to be created with GSoC mentoring in mind. Ie.
>>> >> benchmark CEP, extend CEP with some additional but not-core
>>> functionality,
>>> >> CEP UX improvements, etc. With this in mind I'd like to ask CEP
>>> shepherds
>>> >> to create and tag these tasks with the "gsoc" tag.
>>> >>
>>> >> Please note that mentoring a GSoC project is not only a good way to
>>> attract
>>> >> new members to the community but also a great way of recruiting new
>>> members
>>> >> to your professional teams, so there's great personal, professional
>>> and
>>> >> community benefits to mentoring a GSoC project.
>>> >>
>>> >> Once we have a good amount of tasks with the "gsoc" tag I will work
>>> with
>>> >> prospective mentor to refine the tasks and create a "GSoC Project
>>> Ideas"
>>> >> page and we can write a blog post to announce the project ideas to
>>> >> prospective GSoC contributors.
>>> >>
>>> >> Please let me know what do you think.
>>> >>
>>> >> Paulo
>>> >>
>>> >> [1]
>>> >>
>>> >>
>>> https://opensource.googleblog.com/2021/11/expanding-google-summer-of-code-in-2022.html
>>> <https://urldefense.com/v3/__https://opensource.googleblog.com/2021/11/expanding-google-summer-of-code-in-2022.html__;!!PbtH5S7Ebw!d-LyAzADOf46Mk3mWpGP6v-7fdm2xlnB_dZ1K7cZACiPGSYx8e8_qeJdHvXcPXPBgJntHEvsi8uX-eGvPtsuCAyr7UE$>
>>> >>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>>>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Google Summer of Code 2022 - Kick-off

2022-01-21 Thread Henrik Ingo
Just as an update as it is Friday, I have heard back from about 3
interested mentors, with ideas mainly in the area of testing & performance.
But I'm still struggling to get everyone unlocked from their other
priorities to spend  a few hours on finding or creating new jira tickets
for GSOC projects. I'm optimistic we will get to it before the end of the
month and I'll keep you updated through next week.

henrik

On Wed, Jan 19, 2022 at 3:20 PM Paulo Motta 
wrote:

> Thanks Henrik. I will send another thread shortly with more details on how
> to volunteer to be a GSoC mentor as well as submit project ideas.
>
> Em dom., 16 de jan. de 2022 às 19:13, Henrik Ingo <
> henrik.i...@datastax.com> escreveu:
>
>> Hi Paulo
>>
>> In my experience, it works best when the mentoring organization is clear
>> about who is the mentor, and what is the project each mentor will be
>> mentoring.
>>
>> Tomorrow is MLK day in the US, but let me talk to some people and
>> get back to you around Wednesday. GSoC is  a great opportunity to get new
>> contributors to a project. I'll talk to a few people to figure out how we
>> could prioritize this during January.
>>
>> henrik
>>
>> On Fri, Jan 14, 2022 at 2:52 PM Paulo Motta 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I'd like to follow-up on this as GSoC timeline starts soon in February
>>> [1].
>>>
>>> We didn't get any project ideas tagged with the 'gsoc' label in the past
>>> 2 months, except Aleksandr Sorokoumov which sent me some tickets for
>>> offline triaging. This may indicate this model of having potential mentors
>>> register project ideas may not be working too well.
>>>
>>> I'd like to propose having students triage unassigned tickets and
>>> propose projects to potential mentors in the mailing list.
>>>
>>> We could create a blog post with a "Call for GSoC Projects" with
>>> instructions on how to triage tickets on JIRA and propose project ideas in
>>> the mailing list, and a twitter campaign pointing to the blog post.
>>>
>>> What do you think?
>>>
>>> Paulo
>>>
>>> [1] - https://developers.google.com/open-source/gsoc/timeline
>>>
>>> Em sex., 12 de nov. de 2021 às 11:30, Paulo Motta <
>>> pauloricard...@gmail.com> escreveu:
>>>
>>>> We made an announcement about GSoC on our twitter account (@cassandra)
>>>> with a call to action for potentially interested people to reach out on
>>>> #cassandra-dev.
>>>>
>>>> I kindly ask everyone to refer prospective GSoC contributors to LHF
>>>> tickets and getting started pages so they can get involved with the project
>>>> before the official project ideas are released.
>>>>
>>>> Also, please retweet this to get the word out:
>>>> https://twitter.com/cassandra/status/1459163741002645507
>>>>
>>>> Chers,
>>>>
>>>> Paulo
>>>>
>>>> Em sex., 12 de nov. de 2021 às 08:24, Berenguer Blasi <
>>>> berenguerbl...@gmail.com> escreveu:
>>>>
>>>>> Agreed thx a lot.
>>>>>
>>>>> On 12/11/21 10:02, Benjamin Lerer wrote:
>>>>> > Thanks a lot Paulo for pushing that forward. That is a great way to
>>>>> grow
>>>>> > our community.
>>>>> >
>>>>> > Le jeu. 11 nov. 2021 à 14:32, Paulo Motta 
>>>>> a
>>>>> > écrit :
>>>>> >
>>>>> >> Hi,
>>>>> >>
>>>>> >> The Google Summer of Code organization announced some exciting
>>>>> changes to
>>>>> >> the program next year [1]:
>>>>> >> (1) Starting in 2022, the program will be open to all newcomers of
>>>>> open
>>>>> >> source that are 18 years and older, no longer focusing solely on
>>>>> university
>>>>> >> students.
>>>>> >> (2) GSoC Contributors will be able to choose from multiple size
>>>>> projects
>>>>> >> ~175 hour (medium) and 350 hour (large).
>>>>> >> (3) We are building increased flexibility around the timing of
>>>>> projects -
>>>>> >> there is an option to extend the standard 12 week coding time frame
>>>>> to a
>>>>> >> maximum of 22 weeks.
>>>&g

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-02 Thread Henrik Ingo
on
>>
>> instead of
>>
>> per-index
>>
>> memory limit used by SASI.
>>
>> For example, compactions are running on 10 tables
>>
>> and
>>
>> each
>>
>> has
>>
>> 10
>>
>> indexes. SAI will cap the
>>
>> memory usage with global limit while SASI may use up
>>
>> to
>>
>> 100 *
>>
>> per-index
>>
>> limit.
>>
>> * After flushing in-memory segments to disk, SAI won't
>>
>> merge
>>
>> on-disk
>>
>> segments while SASI
>>
>> attempts to merge them at the end.
>>
>> There are pros and cons of not merging segments:
>>
>> ** Pros: compaction runs faster and requires fewer
>>
>> resources.
>>
>> ** Cons: small segments reduce compression ratio.
>>
>> * SAI on-disk format with row ids compresses better.
>>
>> I understand the desire in keeping out of scope
>>
>> the
>>
>> longer
>>
>> term
>>
>> deprecation
>>
>> and migration plan, but… if SASI provides
>>
>> functionality
>>
>> that
>>
>> SAI
>>
>> doesn't,
>>
>> like tokenisation and DelimiterAnalyzer, yet
>>
>> introduces a
>>
>> body
>>
>> of
>>
>> code
>>
>> ~somewhat similar, shouldn't we be roughly
>>
>> sketching out
>>
>> how
>>
>> to
>>
>> reduce
>>
>> the
>>
>> maintenance surface area?
>>
>> Agreed that we should reduce maintenance area if
>>
>> possible,
>>
>> but
>>
>> only
>>
>> very
>>
>> limited
>>
>> code base (eg. RangeIterator, QueryPlan) can be
>>
>> shared.
>>
>> The
>>
>> rest
>>
>> of
>>
>> the
>>
>> code base
>>
>> is quite different because of on-disk format and
>>
>> cross-index
>>
>> files.
>>
>> The goal of this CEP is to get community buy-in on
>>
>> SAI's
>>
>> design.
>>
>> Tokenization,
>>
>> DelimiterAnalyzer should be straightforward to
>>
>> implement on
>>
>> top
>>
>> of
>>
>> SAI.
>>
>> Can we list what configurations of SASI will
>>
>> become
>>
>> deprecated
>>
>> once
>>
>> SAI
>>
>> becomes non-experimental?
>>
>> Except for "Like", "Tokenisation",
>>
>> "DelimiterAnalyzer",
>>
>> the
>>
>> rest
>>
>> of
>>
>> SASI
>>
>> can
>>
>> be replaced by SAI.
>>
>> Given a few bugs are open against 2i and SASI, can
>>
>> we
>>
>> provide
>>
>> some
>>
>> overview, or rough indication, of how many of them
>>
>> we
>>
>> could
>>
>> "triage
>>
>> away"?
>>
>> I believe most of the known bugs in 2i/SASI either
>>
>> have
>>
>> been
>>
>> addressed
>>
>> in
>>
>> SAI or
>>
>> don't apply to SAI.
>>
>> And, is it time for the project to start
>>
>> introducing new
>>
>> SPI
>>
>> implementations as separate sub-modules and jar
>>
>> files
>>
>> that
>>
>> are
>>
>> only
>>
>> loaded
>>
>> at runtime based on configuration settings? (sorry
>>
>> for
>>
>> the
>>
>> conflation
>>
>> on
>>
>> this one, but maybe it's the right time to raise
>>
>> it
>>
>> :shrug:)
>>
>> Agreed that modularization is the way to go and will
>>
>> speed up
>>
>> module
>>
>> development speed.
>>
>> Does community plan to open another discussion or CEP
>>
>> on
>>
>> modularization?
>>
>> On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever <
>>
>> m...@apache.org>
>>
>> wrote:
>>
>> Adding to Duy's questions…
>>
>> * Hardware specs
>>
>> SASI's performance, specifically the search in the
>>
>> B+
>>
>> tree
>>
>> component,
>>
>> depends a lot on the component file's header being
>>
>> available in
>>
>> the
>>
>> pagecache. SASI benefits from (needs) nodes with
>>
>> lots
>>
>> of
>>
>> RAM.
>>
>> Is
>>
>> SAI
>>
>> bound
>>
>> to this same or similar limitation?
>>
>> Flushing of SASI can be CPU+IO intensive, to the
>>
>> point of
>>
>> saturation,
>>
>> pauses, and crashes on the node. SSDs are a must,
>>
>> along
>>
>> with a
>>
>> bit
>>
>> of
>>
>> tuning, just to avoid bringing down your cluster.
>>
>> Beyond
>>
>> reducing
>>
>> space
>>
>> requirements, does SAI improve on these things? Like
>>
>> SASI
>>
>> how
>>
>> does
>>
>> SAI,
>>
>> in
>>
>> its own way, change/narrow the recommendations on
>>
>> node
>>
>> hardware
>>
>> specs?
>>
>> * Code Maintenance
>>
>> I understand the desire in keeping out of scope the
>>
>> longer
>>
>> term
>>
>> deprecation
>>
>> and migration plan, but… if SASI provides
>>
>> functionality
>>
>> that
>>
>> SAI
>>
>> doesn't,
>>
>> like tokenisation and DelimiterAnalyzer, yet
>>
>> introduces a
>>
>> body
>>
>> of
>>
>> code
>>
>> ~somewhat similar, shouldn't we be roughly sketching
>>
>> out
>>
>> how to
>>
>> reduce
>>
>> the
>>
>> maintenance surface area?
>>
>> Can we list what configurations of SASI will become
>>
>> deprecated
>>
>> once
>>
>> SAI
>>
>> becomes non-experimental?
>>
>> Given a few bugs are open against 2i and SASI, can
>>
>> we
>>
>> provide
>>
>> some
>>
>> overview, or rough indication, of how many of them
>>
>> we
>>
>> could
>>
>> "triage
>>
>> away"?
>>
>> And, is it time for the project to start introducing
>>
>> new
>>
>> SPI
>>
>> implementations as separate sub-modules and jar
>>
>> files
>>
>> that
>>
>> are
>>
>> only
>>
>> loaded
>>
>> at runtime based on configuration settings? (sorry
>>
>> for the
>>
>> conflation
>>
>> on
>>
>> this one, but maybe it's the right time to raise it
>>
>> :shrug:)
>>
>> regards,
>>
>> Mick
>>
>> On Tue, 18 Aug 2020 at 13:05, DuyHai Doan <
>>
>> doanduy...@gmail.com>
>>
>> wrote:
>>
>> Thank you Zhao Yang for starting this topic
>>
>> After reading the short design doc, I have a few
>>
>> questions
>>
>> 1) SASI was pretty inefficient indexing wide
>>
>> partitions
>>
>> because
>>
>> the
>>
>> index
>>
>> structure only retains the partition token, not
>>
>> the
>>
>> clustering
>>
>> colums.
>>
>> As
>>
>> per design doc SAI has row id mapping to partition
>>
>> offset,
>>
>> can
>>
>> we
>>
>> hope
>>
>> that
>>
>> indexing wide partition will be more efficient
>>
>> with
>>
>> SAI
>>
>> ? One
>>
>> detail
>>
>> that
>>
>> worries me is that in the beggining of the design
>>
>> doc,
>>
>> it is
>>
>> said
>>
>> that
>>
>> the
>>
>> matching rows are post filtered while scanning the
>>
>> partition.
>>
>> Can
>>
>> you
>>
>> confirm or infirm that SAI is efficient with wide
>>
>> partitions
>>
>> and
>>
>> provides
>>
>> the partition offsets to the matching rows ?
>>
>> 2) About space efficiency, one of the biggest
>>
>> drawback of
>>
>> SASI
>>
>> was
>>
>> the
>>
>> huge
>>
>> space required for index structure when using
>>
>> CONTAINS
>>
>> logic
>>
>> because
>>
>> of
>>
>> the
>>
>> decomposition of text columns into n-grams. Will
>>
>> SAI
>>
>> suffer
>>
>> from
>>
>> the
>>
>> same
>>
>> issue in future iterations ? I'm anticipating a
>>
>> bit
>>
>> 3) If I'm querying using SAI and providing
>>
>> complete
>>
>> partition
>>
>> key,
>>
>> will
>>
>> it
>>
>> be more efficient than querying without partition
>>
>> key. In
>>
>> other
>>
>> words,
>>
>> does
>>
>> SAI provide any optimisation when partition key is
>>
>> specified
>>
>> ?
>>
>> Regards
>>
>> Duy Hai DOAN
>>
>> Le mar. 18 août 2020 à 11:39, Mick Semb Wever <
>>
>> m...@apache.org>
>>
>> a
>>
>> écrit :
>>
>> We are looking forward to the community's
>>
>> feedback
>>
>> and
>>
>> suggestions.
>>
>> What comes immediately to mind is testing
>>
>> requirements. It
>>
>> has
>>
>> been
>>
>> mentioned already that the project's testability
>>
>> and QA
>>
>> guidelines
>>
>> are
>>
>> inadequate to successfully introduce new
>>
>> features
>>
>> and
>>
>> refactorings
>>
>> to
>>
>> the
>>
>> codebase. During the 4.0 beta phase this was
>>
>> intended
>>
>> to be
>>
>> addressed,
>>
>> i.e.
>>
>> defining more specific QA guidelines for 4.0-rc.
>>
>> This
>>
>> would
>>
>> be
>>
>> an
>>
>> important
>>
>> step towards QA guidelines for all changes and
>>
>> CEPs
>>
>> post-4.0.
>>
>> Questions from me
>>
>> - How will this be tested, how will its QA
>>
>> status and
>>
>> lifecycle
>>
>> be
>>
>> defined? (per above)
>>
>> - With existing C* code needing to be changed,
>>
>> what
>>
>> is the
>>
>> proposed
>>
>> plan
>>
>> for making those changes ensuring maintained QA,
>>
>> e.g.
>>
>> is
>>
>> there
>>
>> separate
>>
>> QA
>>
>> cycles planned for altering the SPI before
>>
>> adding
>>
>> a
>>
>> new SPI
>>
>> implementation?
>>
>> - Despite being out of scope, it would be nice
>>
>> to have
>>
>> some
>>
>> idea
>>
>> from
>>
>> the
>>
>> CEP author of when users might still choose
>>
>> afresh 2i
>>
>> or
>>
>> SASI
>>
>> over
>>
>> SAI,
>>
>> - Who fills the roles involved? Who are the
>>
>> contributors
>>
>> in
>>
>> this
>>
>> DataStax
>>
>> team? Who is the shepherd? Are there other
>>
>> stakeholders
>>
>> willing
>>
>> to
>>
>> be
>>
>> involved?
>>
>> - Is there a preference to use gdoc instead of
>>
>> the
>>
>> project's
>>
>> wiki,
>>
>> and
>>
>> why? (the CEP process suggest a wiki page, and
>>
>> feedback on
>>
>> why
>>
>> another
>>
>> approach is considered better helps evolve the
>>
>> CEP
>>
>> process
>>
>> itself)
>>
>> cheers,
>>
>> Mick
>>
>>
>>
>> -
>>
>>
>> To unsubscribe, e-mail:
>>
>> dev-unsubscr...@cassandra.apache.org
>>
>> For
>>
>> additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>>
>> -
>>
>> To
>>
>> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>
>> For
>>
>> additional
>>
>> commands, e-mail: dev-h...@cassandra.apache.org
>>
>> --
>> alex p
>>
>>
>>
>> -
>>
>> To
>>
>> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>
>> For
>>
>> additional
>>
>> commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>>
>>
>>
>>
>> -
>>
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>>
>>
>>
>> --
>> alex p
>>
>>
>>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [GSOC] Call for Mentors

2022-02-02 Thread Henrik Ingo
Hi Paulo

I think Shaunak and Aleks V already pinged you on Slack about their ideas.
When you say we don't have any subscribed ideas, what is missing?

henrik

On Wed, Feb 2, 2022 at 4:03 PM Paulo Motta  wrote:

> Hi everyone,
>
> We need to tell ASF how many slots we will need for GSoC (if any) by
> February 20. So far we don't have any subscribed project ideas.
>
> If you are interested in being a GSoC mentor, just ping me on slack and I
> will be happy to give you feedback on the project idea proposal. Please do
> so by no later than February 10 to allow sufficient time for follow-ups.
>
> Cheers,
>
> Paulo
>
> Em qua., 19 de jan. de 2022 às 10:54, Paulo Motta 
> escreveu:
>
>> Hi everyone,
>>
>> Following up from the initial GSoC Kick-off thread [1] I would like to
>> invite contributors to submit GSoC project ideas. In order to submit a
>> project idea, just tag a JIRA ticket with the "gsoc" label and add yourself
>> to the "Mentor" field to indicate you're willing to mentor this project.
>>
>> Existing JIRA tickets can be repurposed as GSoC projects or new tickets
>> can be created with new features or improvements specifically for GSoC. The
>> best GSoC project ideas are those which are self-contained: have a well
>> defined scope, discrete milestones and definition of done. Generally the
>> areas which are easier for GSoC contributors to get started are:
>> - UX improvements
>> - Tools
>> - Benchmarking
>> - Refactoring and Modularization
>>
>> Non-committers are more than welcome to submit project ideas and mentor
>> projects, as long as a committer is willing to co-mentor the project. As a
>> matter of fact I was a GSoC mentor before becoming a committer, so I can
>> say this is a great way to pave your way to committership. ;)
>>
>> Mentor tasks involve having 1 or 2 weekly meetings with the GSoC
>> participant to track the project status and give guidance to the
>> participant towards the completion of the project, as well as reviewing
>> code submissions.
>>
>> This year, GSoC is open to any participant over 18 years of age, no
>> longer focusing solely on university students. GSoC projects can be of ~175
>> hour (medium) and 350 hour (large), and can range from 12 to 22 weeks
>> starting in July.
>>
>> We have little less than 2 months until the start of the GSoC application
>> period on March 7, but ideally we want to have an "Ideas List" ready before
>> that so prospective participants can start engaging with the project and
>> working with mentors to refine the project before submitting an application.
>>
>> This year I will not be able to participate as a primary mentor but I
>> would be happy to co-mentor other projects as well as help with questions
>> and guidance.
>>
>> Kind regards,
>>
>> Paulo
>>
>> [1] https://lists.apache.org/thread/58v2bvfzwtfgqdx90qmm4tmyoqzsgtn4
>>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-07 Thread Henrik Ingo
Thanks Benjamin for reviewing and raising this.

While I don't speak for the CEP authors, just some thoughts from me:

On Mon, Feb 7, 2022 at 11:18 AM Benjamin Lerer  wrote:

> I would like to raise 2 points regarding the current CEP proposal:
>
> 1. There are mention of some target versions and of the removal of SASI
>
> At this point, we have not agreed on any version numbers and I do not feel
> that removing SASI should be part of the proposal for now.
> It seems to me that we should see first the adoption surrounding SAI
> before talking about deprecating other solutions.
>
>
This seems rather uncontroversial. I think the CEP template and previous
CEPs invite  the discussion on whether the new feature will or may replace
an existing feature. But at the same time that's of course out of scope for
the work at hand. I have no opinion one way or the other myself.



> 2. OR queries
>
> It is unclear to me if the proposal is about adding OR support only for
> SAI index or for other types of queries too.
> In the past, we had the nasty habit for CQL to provide only partialially
> implemented features which resulted in a bad user experience.
> Some examples are:
> * LIKE restrictions which were introduced for the need of SASI and were
> not never supported for other type of queries
> * IS NOT NULL restrictions for MATERIALIZED VIEWS that are not supported
> elsewhere
> * != operator only supported for conditional inserts or updates
> And there are unfortunately many more.
>
> We are currenlty slowly trying to fix those issue and make CQL a more
> mature language. By consequence, I would like that we change our way of
> doing things. If we introduce support for OR it should also cover all the
> other type of queries and be fully tested.
> I also believe that it is a feature that due to its complexity fully
> deserves its own CEP.
>
>
The current code that would be submitted for review after the CEP is
adopted, contains OR support beyond just SAI indexes. An initial
implementation first targeted only such queries where all columns in a
WHERE clause using OR needed to be backed by an SAI index. This was since
extended to also support ALLOW FILTERING mode as well as OR with clustering
key columns. The current implementation is by no means perfect as a general
purpose OR support, the focus all the time was on implementing OR support
in SAI. I'll leave it to others to enumerate exactly the limitations of the
current implementation.

Seeing that also Benedict supports your point of view, I would steer the
conversation more into a project management perspective:
* How can we advance CEP-7 so that the bulk of the SAI code can still be
added to Cassandra, so that  users can benefit from this new index type,
albeit without OR?
* This is also an important question from the point of view that this is a
large block of code that will inevitably diverged if it's not in trunk.
Also, merging it to trunk will allow future enhancements, including the OR
syntax btw, to happen against trunk (aka upstream first).
* Since OR support nevertheless is a feature of SAI, it needs to be at
least unit tested, but ideally even would be exposed so that it is possible
to test on the CQL level. Is there some mechanism such as experimental
flags, which would allow the SAI-only OR support to be merged into trunk,
while a separate CEP is focused on implementing "proper" general purpose OR
support? I should note that there is no guarantee that the OR CEP would be
implemented in time for the next release. So the answer to this point needs
to be something that doesn't violate the desire for good user experience.

henrik


Re: [GSOC] Call for Mentors

2022-02-11 Thread Henrik Ingo
Hi Paulo

Just checking, am I using Jira right:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20labels%20%3D%20gsoc%20and%20statusCategory%20!%3D%20Done%20

It looks like we ended up with no gsoc projects submitted? Or am I querying
wrong?

henrik

On Thu, Feb 3, 2022 at 12:26 AM Paulo Motta 
wrote:

> Hi Henrik,
>
> I am happy to give feedback to project ideas - but they ultimately need to
> be registered by prospective mentors on JIRA with the "gsoc" tag to be
> considered a "subscribed idea".
>
> The project idea JIRA should have a "high level" overview of what the
> project is:
> - What is the problem statement?
> - Rough plan on how to approach the problem.
> - What are the main milestones/deliverables? (ie.
> code/benchmark/framework/blog post etc)
> - What prior knowledge is required to complete the task?
> - What warm-up tasks can the candidate do to ramp up for the project?
>
> The mentor will work with potential participants to refine the high level
> description into smaller subtasks at a later stage (during candidate
> application period).
>
> Cheers,
>
> Paulo
>
> Em qua., 2 de fev. de 2022 às 19:02, Henrik Ingo 
> escreveu:
>
>> Hi Paulo
>>
>> I think Shaunak and Aleks V already pinged you on Slack about their
>> ideas. When you say we don't have any subscribed ideas, what is missing?
>>
>> henrik
>>
>> On Wed, Feb 2, 2022 at 4:03 PM Paulo Motta 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> We need to tell ASF how many slots we will need for GSoC (if any) by
>>> February 20. So far we don't have any subscribed project ideas.
>>>
>>> If you are interested in being a GSoC mentor, just ping me on slack and
>>> I will be happy to give you feedback on the project idea proposal. Please
>>> do so by no later than February 10 to allow sufficient time for follow-ups.
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>> Em qua., 19 de jan. de 2022 às 10:54, Paulo Motta 
>>> escreveu:
>>>
>>>> Hi everyone,
>>>>
>>>> Following up from the initial GSoC Kick-off thread [1] I would like to
>>>> invite contributors to submit GSoC project ideas. In order to submit a
>>>> project idea, just tag a JIRA ticket with the "gsoc" label and add yourself
>>>> to the "Mentor" field to indicate you're willing to mentor this project.
>>>>
>>>> Existing JIRA tickets can be repurposed as GSoC projects or new tickets
>>>> can be created with new features or improvements specifically for GSoC. The
>>>> best GSoC project ideas are those which are self-contained: have a well
>>>> defined scope, discrete milestones and definition of done. Generally the
>>>> areas which are easier for GSoC contributors to get started are:
>>>> - UX improvements
>>>> - Tools
>>>> - Benchmarking
>>>> - Refactoring and Modularization
>>>>
>>>> Non-committers are more than welcome to submit project ideas and mentor
>>>> projects, as long as a committer is willing to co-mentor the project. As a
>>>> matter of fact I was a GSoC mentor before becoming a committer, so I can
>>>> say this is a great way to pave your way to committership. ;)
>>>>
>>>> Mentor tasks involve having 1 or 2 weekly meetings with the GSoC
>>>> participant to track the project status and give guidance to the
>>>> participant towards the completion of the project, as well as reviewing
>>>> code submissions.
>>>>
>>>> This year, GSoC is open to any participant over 18 years of age, no
>>>> longer focusing solely on university students. GSoC projects can be of ~175
>>>> hour (medium) and 350 hour (large), and can range from 12 to 22 weeks
>>>> starting in July.
>>>>
>>>> We have little less than 2 months until the start of the GSoC
>>>> application period on March 7, but ideally we want to have an "Ideas List"
>>>> ready before that so prospective participants can start engaging with the
>>>> project and working with mentors to refine the project before submitting an
>>>> application.
>>>>
>>>> This year I will not be able to participate as a primary mentor but I
>>>> would be happy to co-mentor other projects as well as help with questions
>>>> and guidance.
>>>>
>>>> Kind regards,
>>>>
>>>&g

Re: [DISCUSS] CEP-7 Storage Attached Index

2022-02-14 Thread Henrik Ingo
On Fri, Feb 11, 2022 at 8:47 PM Caleb Rackliffe 
wrote:

> Just finished reading the latest version of the CEP. Here are my thoughts:
>
> - We've already talked about OR queries, so I won't rehash that, but
> tokenization support seems like it might be another one of those places
> where we can cut scope if we want to get V1 out the door. It shouldn't be
> that hard to detangle from the rest of the code.
>

The tokenization support is already implemented. It's available in our
public fork but at least last time I was involved, there's not really any
public documentation. Lucene comes with dozens of tokenizers so the
documentation effort will be significant.

So the situation is similar to OR: The community may want to break out a
separate CEP to debate the user facing syntax. Alternatively, this can
simply happen as part of the PR that could be submitted as soon as CEP-7 is
approved.



> - We mention the JMX metric ecosystem in the CEP, but not the related
> virtual tables. This isn't a big issue, and doesn't mean we need to change
> the CEP, but it might be helpful for those not familiar with the existing
> prototype to know they exist :)
>

Thanks for the callout. Maybe they should indeed be mentioned together.


> - It's probably below the line for CEP discussion, but the text and
> numeric index formats will probably change over time. We don't need a whole
> "codec framework" for V1, but we're still embedding some versioning
> information in the column index on-disk structures, right?
>
>
On the contrary, this is a very valid question. As you know SAI has been GA
for over a year in both our DSE and Astra products, and what is described
in CEP-7 to be included in Cassandra is for the SAI team known as V2. (But
to be clear, it's named V1 in the CEP and in the context of Cassandra!) So
the code does contain facilities to support multiple generations of index
formats. If encountering an sstable of the older version, then the relevant
code would be used to read the index files. Upon compaction the newer
version is written. And there needs to be some kind of global check to know
that new features are only available once all sstables cluster wide are of
the required version.


> To offset my obvious partiality around this CEP, I've already made an
> effort to raise some of the issues that may come up to challenge us from a
> macro perspective. It seems like the prevailing opinion here is that they
> are either surmountable or simply basic conceptual difficulties w/
> distributed secondary indexing.
>
>
This might be a good moment to say that we really appreciate your
investment and support in this CEP!

henrik


Pluggability improvements in 4.1

2022-04-26 Thread Henrik Ingo
Hi all

As one would expect, I've been involved in several discussions lately on
what is going to make it into 4.1, versus what patches unfortunately won't.

In particular debating this with Patrick McFaddin we realized that a big
theme in 4.1 appears to be a huge number of pluggability improvements. So
the intent of this email is to take an inventory of all new plugin APIs I'm
aware of, and invite the community to add to the list where I'm not aware
of some work.


CEP-9
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-9%3A+Make+SSLContext+creation+pluggable>
Pluggable SSLContext. Allows to store SSL certs and secrets elsewhere than
in files. Supplies an example implementation for storing as Kubernetes
Secret.


*CEP-10*
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations>
SimpleCondition, Semaphore, CountDownLatch, BlockingQueue, etc
Executors, futures, starting threads, etc - including important
improvements to consistency of approach in the codebase
The use of currentTimeMillis and nanoTime
The replacement of java.io.File with a wrapper on java.nio.files.Path
providing an ergonomic API, and some improvements to consistency of file
handling
Support for alternative streaming implementations
Improvements to the dtest API to support necessary functionality

Commentary: Of the above at least the Path and alternative streaming
implementations seem like significant APIs that can be used for much more
than just fault injection. In fact, I believe java.nio.files.Path is what
we use in Astra Serverless to send files to S3 instead of local filesystem.


*CEP-11*
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations>
Pluggable memtable

Commentary: While we won't have any new memtable implementations in 4.1, it
is a goal to merge the memtable API this week. Notably, since this is
designed to support also persistent memtables (ie memtable on persistent
memory), this new API could essentially be seen as a full blown storage
engine API.


*CASSANDRA-17044* <https://issues.apache.org/jira/browse/CASSANDRA-17044>
Pluggable schema management

I hear rumors someone may be working on a new schema management
implementation?


(Just for completeness, CASSANDRA-17058
<https://issues.apache.org/jira/browse/CASSANDRA-17058> pluggable cluster
membership is not merged.)

CEP-16
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-16%3A+Auth+Plugin+Support+for+CQLSH>
While client side, worth mentioning: Pluggable auth for CQLSH



If there are more that I don't know about, please reply and add to the list.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Henrik Ingo
 another example, performing the canonical bank transfer:
>
> BEGIN TRANSACTION;
>   UPDATE accounts SET balance += 100 WHERE name='blake' AS blake;
>   UPDATE accounts SET balance -= 100 WHERE name='benedict' AS benedict;
> COMMIT TRANSACTION IF blake EXISTS AND benedict.balance >= 100;
>
> As you can see from the examples, column values can be referenced via a
> dot syntax, ie: . -> select1.value. Since the read
> portion of the transaction is performed before evaluating conditions or
> applying updates, values read can be freely applied to non-primary key
> values in updates. Select statements used either in checking a condition or
> creating an update must be restricted to a single row, either by specifying
> the full primary key or a limit of 1. Multi-row selects are allowed, but
> only for returning data to the client (see below).
>
> For evaluating conditions, = & != are available for all types, <, <=, >,
> >= are available for numerical types, and EXISTS, NOT EXISTS can be used
> for partitions, rows, and values. If any column references cannot be
> satisfied by the result of the reads, the condition implicitly fails. This
> prevents having to include a bunch of exists statements.
>
> On completion, an operation would return a boolean value indicating the
> operation had been applied, and a result set for each named select (but not
> named update). We could also support an optional RETURN keyword, which
> would allow the user to only return specific named selects (ie: RETURN
> select1, select2).
>
> Let me know what you think!
>
> Blake
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: CEP-15 multi key transaction syntax

2022-06-06 Thread Henrik Ingo
On Mon, Jun 6, 2022 at 5:28 PM bened...@apache.org 
wrote:

> > One way to make it obvious is to require the user to explicitly type
> the SELECTs and then to require that all SELECTs appear before
> UPDATE/INSERT/DELETE.
>
>
>
> Yes, I agree that SELECT statements should be required to go first.
>
>
>
> However, I think this is sufficient and we can retain the shorter format
> for RETURNING. There only remains the issue of conditions imposed upon
> UPDATE/INSERT/DELETE statements when there are multiple statements that
> affect the same primary key. I think we can (and should) simply reject such
> queries for now, as it doesn’t make much sense to have multiple statements
> for the same primary key in the same transaction.
>
>
I guess I was thinking ahead to a future where and UPDATE write set may or
may not intersect with a previous update due to allowing WHERE clause to
use secondary keys, etc.

That said, I'm not saying we SHOULD require explicit SELECT statements for
every update. I'm sure that would be annoying more than useful.I was just
following a train of thought.



>
>
> > Returning the "result" from an UPDATE presents the question should it
> be the data at the start of the transaction or end state?
>
>
>
> I am inclined to only return the new values (as proposed by Alex) for the
> purpose of returning new auto-increment values etc. If you require the
> prior value, SELECT is available to express this.
>
>
That's a great point!


>
>
> > I was thinking the following coordinator-side implementation would
> allow to use also old drivers
>
>
>
> I am inclined to return just the first result set to old clients. I think
> it’s fine to require a client upgrade to get multiple result sets.
>
>
Possibly. I just wanted to share an idea for consideration. IMO the temp
table idea might not be too hard to implement*, but sure the syntax does
feel a bit bolted on.

*) I'm maybe the wrong person to judge that, of course :-)

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


[DISCUSS] Cassandra ecosystem site

2022-07-04 Thread Henrik Ingo
Hi all

One thing that is common to all successful large open source projects is
that they inevitably grow a huge ecosystem of plugins, addons or supporting
tools that are independent from the official core project. For example
addons.mozilla.org and pypi.org are two examples of what I mean.

Some time ago I stumbled into flink-packages.org . As you might guess it's
a catalogue of 3rd party plugins for Apache Flink. The source code is
available at https://github.com/ververica/flink-ecosystem so I decided to
fork it and do s/Flink/Cassandra/ and see where that brings me.

It ended up like this: http://35.184.128.58/ and
https://github.com/henrikingo/flink-ecosystem

With the help of some colleagues we've populated the site with the entries
from https://cassandra.apache.org/_/ecosystem.html plus some other things
we could think of. The point with the site is that anyone is welcome to
authenticate with your github account, and submit an entry with links to
your favorite Cassandra addon or tool. The idea is that for example the
developer of a tool is welcome to add and maintain their own entry.

Currently the site is in my personal GCP account. The proposal - which I
would like to hear discussion and feedback on - is essentially to replace
https://cassandra.apache.org/_/ecosystem.html with this site. It could live
at something like ecosystem.cassandra.apache.org for example. The goal of
this would be to give more visibility to software that is part of the
Cassandra ecosystem in the wider/widest sense. For example, in addition to
launching this site, we could feature individual tools/addons in blogs and
social media.

I'm happy to share admin access with a few additional volunteers, or for
that matter to move the site to a server I don't need to pay for and
maintain. If we do want to start using this, maybe an immediate need would
be to setup a backup of the MySQL database that hosts the contents of the
site.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Thanks to Nate for his service as PMC Chair

2022-07-14 Thread Henrik Ingo
Thank you Nate for holding the baton for all these years. Even as a
relative newcomer (2+ years already) I wanted to say I do understand and
appreciate your role in carrying the torch to where the project is today.

And Congratulations Mick. Your humble and quiet style of serving the
project is something me and many others can look up to. Thank you for all
the time and energy you bring to Cassandra.

henrik

On Mon, Jul 11, 2022 at 3:55 PM Paulo Motta  wrote:

> Hi,
>
> I wanted to announce on behalf of the Apache Cassandra Project Management
> Committee (PMC) that Nate McCall (zznate) has stepped down from the PMC
> chair role. Thank you Nate for all the work you did as the PMC chair!
>
> The Apache Cassandra PMC has nominated Mick Semb Wever (mck) as the new
> PMC chair. Congratulations and good luck on the new role Mick!
>
> The chair is an administrative position that interfaces with the Apache
> Software Foundation Board, by submitting regular reports about project
> status and health. Read more about the PMC chair role on Apache projects:
> - https://www.apache.org/foundation/how-it-works.html#pmc
> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>
> The PMC as a whole is the entity that oversees and leads the project and
> any PMC member can be approached as a representative of the committee. A
> list of Apache Cassandra PMC members can be found on:
> https://cassandra.apache.org/_/community.html
>
> Kind regards,
>
> Paulo
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [Marketing] For Review: Pluggable Memtable Implementations blog

2022-07-14 Thread Henrik Ingo
Thanks. This is concise and well written, yet exciting already to see the
sharded skip list results, and with exciting "more to come" at the end.
This is a big deal - I believe - for Cassandra and therefore I appreciate
that this blog post will be published.

henrik

On Wed, Jul 13, 2022 at 6:36 PM Chris Thornett  wrote:

> The 72-hour window for community review is now open for Branimir Lambov's
> blog on *Pluggable Memtable Implementations*. Please indicate any amends
> in the comments. Thanks!
>
> https://docs.google.com/document/d/1Iws9gQZCp_80Be_ZfuyAB82X1EhqWihEb35Sevv-JDE/edit
> --
>
> Chris Thornett
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: Cassandra project status update 2022-07-18

2022-07-19 Thread Henrik Ingo
On Mon, Jul 18, 2022 at 11:42 PM Josh McKenzie  wrote:


> [CI Trends]
> https://butler.cassandra.apache.org/#/
>
> The last three weeks show us this delightful trend:
>
> 3.0: 10 -> 10
> 3.11: 23 -> 15
> 4.0: 2 -> 1
> 4.1: 8 -> 10 -> 5
> trunk: 20 -> 5
>
> That's 36 total failures across all our branches! This is a new low
> watermark for us; we're getting close to both a 4.1 RC as well as needing
> to start serious discussions about potential improvements to our workflows
> to keep these test boards green. Knocking down flaky test failures
> continues to be a high-investment activity for the project; thank you as
> always to everyone taking the time to drive CI back to health!
> 
>

This is exciting to see and thanks Josh for summarizing the stats in your
email.

Thanks to all the hard working engineers who's joint efforts are now seen
in those numbers! And thanks again Tomek for your 2+ years of personal
dedication to Butler. This is yet another example of how a good tool will
help its users to understand what they need to do, and then makes it easy
to go and do it.

Once those numbers go down to 0 I look forward to a discussion on how to
lock ourselves into some process where it will also stay that way. Butler
includes functionality that can help with various blocking merge policies,
otoh once you get to a truly green build, you might not need Butler even.

henrik


Re: [DISCUSS] Cassandra ecosystem site

2022-08-07 Thread Henrik Ingo
Thanks for sharing, Rahul

Although I knew about Cassandra.link, I somehow had not heard about
Cassandra.tools. I love the current Cassandra.link btw, I think the layout
flows nicely and it feels easier to browse the articles and news compared
to last time I looked.

As for Cassandra.tools, it indeed seems to be very similar to what I put
together. I agree that a static site generator is an option, since
contributors would typically be comfortable with adding content via
submitting PRs with Markdown. Otoh what I feel I'm missing is the nice
grouping into categories that http://35.184.128.58/ has. There I do like
the structure and smooth flow of jamstackthemes.dev.

Do I read you correctly that you are open to collaborate and build upon
cassandra.tools rather than  http://35.184.128.58/? To be specific, is it
aligned with your goals for cassandra.tools that it would be a neutral
community site? (I actually don't know if it needs to be under any kind of
formal Cassandra project supervision, but either way should be similar in
spirit to what content can currently go into
https://cassandra.apache.org/_/ecosystem.html.

As a final comment... If we are going back to static site generators, I
guess I should mention for the record that simply evolving the current
https://cassandra.apache.org/_/ecosystem.html is an option too.

henrik

On Wed, Jul 27, 2022 at 6:59 PM Rahul Xavier Singh <
rahul.xavier.si...@gmail.com> wrote:

> Henrik
>
> What a great find! Love the filterability. Our team and I have been
> curating on https://cassandra.tools/
> <https://urldefense.com/v3/__https://cassandra.tools/__;!!PbtH5S7Ebw!e2PlRJmfEuZkacYBgAqGI0rJLJeePGh6jCyohtgYzSrBH-TIOesfuqmPrfneRqQ-dJiC5v14Sj-oXRnMlcMBN02GT-1vNg$>
> for a few years on a Markdown based Static Generated Site. Right now it
> uses Gatsby. We had gotten the first UI from a jamstack Headless CMS
> listing a while ago.
>
> All the content is in Markdown files. We deliberately chose this so that
> in the worst case it can be hosted for free on any static hosting site in
> the world including Github Pages.
>
> There's another iteration of where we can find inspiration.  This is a
> Hugo generated site.
>
> https://github.com/stackbit/jamstackthemes
> This powers https://jamstackthemes.dev/
> <https://urldefense.com/v3/__https://jamstackthemes.dev/__;!!PbtH5S7Ebw!e2PlRJmfEuZkacYBgAqGI0rJLJeePGh6jCyohtgYzSrBH-TIOesfuqmPrfneRqQ-dJiC5v14Sj-oXRnMlcMBN00UagwkUQ$>
>
> All of the content is in markdown here as well
> https://github.com/stackbit/jamstackthemes/blob/master/content/services/formstack/_index.md
>
> This is the story of precaution.. the moment you have to depend on a
> database for any site, it becomes a service that has to be hosted, managed,
> cared for... I strongly recommend a static generator even if you continue
> to use a database.
>
> There's also a source of content from the DB that powers our
> https://cassandra.link
> <https://urldefense.com/v3/__https://cassandra.link__;!!PbtH5S7Ebw!e2PlRJmfEuZkacYBgAqGI0rJLJeePGh6jCyohtgYzSrBH-TIOesfuqmPrfneRqQ-dJiC5v14Sj-oXRnMlcMBN00_DcmPDg$>
> which is also generated from Gatsby but is backed by a MySQL based app.
> We've backlogged moving it to be hosted on a Cassandra variant like Astra
> or otherwise, but you know as well as most people, this guy named Free Time
> is hard to get in touch with.
>
>
>
> Rahul Singh
>
> Chief Executive Officer | Business Platform Architect m: 202.905.2818 e:
> rahul.si...@anant.us li: http://linkedin.com/in/xingh
> <https://urldefense.com/v3/__http://linkedin.com/in/xingh__;!!PbtH5S7Ebw!e2PlRJmfEuZkacYBgAqGI0rJLJeePGh6jCyohtgYzSrBH-TIOesfuqmPrfneRqQ-dJiC5v14Sj-oXRnMlcMBN00sLGZUVA$>
> ca: http://calendly.com/xingh
> <https://urldefense.com/v3/__http://calendly.com/xingh__;!!PbtH5S7Ebw!e2PlRJmfEuZkacYBgAqGI0rJLJeePGh6jCyohtgYzSrBH-TIOesfuqmPrfneRqQ-dJiC5v14Sj-oXRnMlcMBN03vd0kOIw$>
>
> *We create, support, and manage real-time global data & analytics
> platforms for the modern enterprise.*
>
> *Anant | https://anant.us
> <https://urldefense.com/v3/__https://anant.us/__;!!PbtH5S7Ebw!e2PlRJmfEuZkacYBgAqGI0rJLJeePGh6jCyohtgYzSrBH-TIOesfuqmPrfneRqQ-dJiC5v14Sj-oXRnMlcMBN032I5eyNw$>*
>
> 3 Washington Circle, Suite 301
>
> Washington, D.C. 20037
>
> *http://Cassandra.Link
> <https://urldefense.com/v3/__http://cassandra.link/__;!!PbtH5S7Ebw!e2PlRJmfEuZkacYBgAqGI0rJLJeePGh6jCyohtgYzSrBH-TIOesfuqmPrfneRqQ-dJiC5v14Sj-oXRnMlcMBN00cLVV0aQ$>*
>  :
> The best resources for Apache Cassandra
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.pr

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-22 Thread Henrik Ingo
One thought: The way the CEP is currently written, it is only possible to
mask a column one way. You can only define one masking function for a
column, and since you use the original column name, you could only return
one version of it in the result set, even if you had a way to define
several functions.

I'm not proposing this should change, just calling it out.

henrik

On Fri, Aug 19, 2022 at 2:50 PM Andrés de la Peña 
wrote:

> Hi everyone,
>
> I'd like to start a discussion about this proposal for dynamic data
> masking:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking
>
> Dynamic data masking allows to obscure sensitive information without
> changing the stored data. It would be based on a set of native CQL
> functions providing different types of masking, such as replacing the
> column value by "". These functions could be used as regular functions
> or attached to table columns with CREATE/ALTER table. There would be a new
> UNMASK permission, so only the users with this permissions would be able to
> see the unmasked column values. It would be possible to customize masking
> by using UDFs as masking functions.
>
> Thanks,
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-23 Thread Henrik Ingo
On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña 
wrote:

> One thought: The way the CEP is currently written, it is only possible to
>> mask a column one way. You can only define one masking function for a
>> column, and since you use the original column name, you could only return
>> one version of it in the result set, even if you had a way to define
>> several functions.
>>
>
> Right, it's one single type of mapping per the column, declared on
> CREATE/ALTER TABLE statements. Also, users can manually specify their own
> masking function in SELECT statements if they have permissions for seeing
> the clear data.
>
> For those cases where the data is automatically masked for an unprivileged
> user, I don't see the use of including different types of masking for the
> same column into the same result set. Instead, we might be interested on
> having different types of masking associated to different roles. We could
> do so with dedicated CREATE/DROP/LIST MASK statements, instead of using the
> CREATE/ALTER/DESCRIBE TABLE statements. That CREATE MASK statement would
> associate a masking function to a column and role. However, I'm not sure we
> need that type of granularity instead of the simplicity of attaching the
> masking to the column declaration. wdyt?
>
>
>
My gut feeling likewise is that this adds complexity but little value.

>
>>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Henrik Ingo
investigate, they will likely use a SUPERUSER account, and they'll
>>>> see that data.
>>>>
>>>> How hard would it be for SUPERUSERs to *not* automatically get the
>>>> UNMASK permission?
>>>>
>>>> I'll also echo the concerns around masking primary key components.
>>>> It's highly likely that certain personal data properties would be used as a
>>>> partition or clustering key (ex: range query for people born within a
>>>> certain timeframe).  In addition to the "breaks existing" concern, I'm
>>>> curious about the challenges around getting that to work with the current
>>>> primary key implementation.
>>>>
>>>> Does this first implementation only apply to payload (non-key)
>>>> columns?  The examples in the CEP currently do not show primary key
>>>> components being masked.
>>>>
>>>> Thanks,
>>>>
>>>> Aaron
>>>>
>>>>
>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo 
>>>> wrote:
>>>>
>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <
>>>>> adelap...@apache.org> wrote:
>>>>>
>>>>>> One thought: The way the CEP is currently written, it is only
>>>>>>> possible to mask a column one way. You can only define one masking 
>>>>>>> function
>>>>>>> for a column, and since you use the original column name, you could only
>>>>>>> return one version of it in the result set, even if you had a way to 
>>>>>>> define
>>>>>>> several functions.
>>>>>>>
>>>>>>
>>>>>> Right, it's one single type of mapping per the column, declared on
>>>>>> CREATE/ALTER TABLE statements. Also, users can manually specify their own
>>>>>> masking function in SELECT statements if they have permissions for seeing
>>>>>> the clear data.
>>>>>>
>>>>>> For those cases where the data is automatically masked for an
>>>>>> unprivileged user, I don't see the use of including different types of
>>>>>> masking for the same column into the same result set. Instead, we might 
>>>>>> be
>>>>>> interested on having different types of masking associated to different
>>>>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK statements,
>>>>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That CREATE
>>>>>> MASK statement would associate a masking function to a column and role.
>>>>>> However, I'm not sure we need that type of granularity instead of the
>>>>>> simplicity of attaching the masking to the column declaration. wdyt?
>>>>>>
>>>>>>
>>>>>>
>>>>> My gut feeling likewise is that this adds complexity but little value.
>>>>>
>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Henrik Ingo
>>>>>
>>>>> +358 40 569 7354 <358405697354>
>>>>>
>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>>>>> us on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us
>>>>> on YouTube.]
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>>>>   [image: Visit my LinkedIn profile.]
>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu1wnvEAU$>
>>>>>
>>>>
>>>
>>> --
>>> +---+
>>> | Derek Chen-Becker |
>>> | GPG Key available at https://keybase.io/dchenbecker
>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu-uKf-oY$>
>>> and   |
>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRuz_jdH0t$>
>>> |
>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>> +---+
>>>
>>>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>


Re: [DISCUSS] CEP-21: Transactional Cluster Metadata

2022-09-05 Thread Henrik Ingo
Mostly I just wanted to ack that at least someone read the doc (somewhat
superficially sure, but some parts with thought...)

One pre-feature that we would include in the preceding minor release is a
> node level switch to disable all operations that modify cluster metadata
> state. This would include schema changes as well as topology-altering
> events like move, decommission or (gossip-based) bootstrap and would be
> activated on all nodes for the duration of the *major* upgrade. If this
> switch were accessible via internode messaging, activating it for an
> upgrade could be automated. When an upgraded node starts up, it could send
> a request to disable metadata changes to any peer still running the old
> version. This would cost a few redundant messages, but simplify things
> operationally.
>
> Although this approach would necessitate an additional minor version
> upgrade, this is not without precedent and we believe that the benefits
> outweigh the costs of additional operational overhead.
>

Sounds like a great idea, and probably necessary in practice?


> If this part of the proposal is accepted, we could also include further
> messaging protocol changes in the minor release, as these would largely
> constitute additional verbs which would be implemented with no-op verb
> handlers initially. This would simplify the major version code, as it would
> not need to gate the sending of asynchronous replication messages on the
> receiver's release version. During the migration, it may be useful to have
> a way to directly inject gossip messages into the cluster, in case the
> states of the yet-to-be upgraded nodes become inconsistent. This isn't
> intended, so such a tool may never be required, but we have seen that
> gossip propagation can be difficult to reason about at times.
>

Others will know the code better and I understand that adding new no-op
verbs can be considered safe... But instinctively a bit hesitant on this
one. Surely adding a few if statements to the upgraded version isn't that
big of a deal?

Also, it should make sense to minimize the dependencies from the previous
major version (without CEP-21) to the new major version (with CEP-21). If a
bug is found, it's much easier to fix code in the new major version than
the old and supposedly stable one.

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>