from:"Claude Warren"

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-12-14 Thread Claude Warren

Is there still interest in this?  Can we get some points down on electrons so 
that we all understand the issues?

While it is fairly simple to redirect the read/write to something other  than 
the local system for a single node this will not solve the problem for tiered 
storage.

Tiered storage will require that on read/write the primary key be assessed and 
determine if the read/write should be redirected.  My reasoning for this 
statement is that in a cluster with a replication factor greater than 1 the 
node will store data for the keys that would be allocated to it in a cluster 
with a replication factor = 1, as well as some keys from nodes earlier in the 
ring.

Even if we can get the primary keys for all the data we want to write to "cold 
storage" to map to a single node a replication factor > 1 means that data will 
also be placed in "normal storage" on subsequent nodes.

To overcome this, we have to explore ways to route data to different storage 
based on the keys and that different storage may have to be available on _all_  
the nodes.

Have any of the partial solutions mentioned in this email chain (or others) 
solved this problem?

Claude

Implementing a secondary index

2021-11-17 Thread Claude Warren

Greetings,

I am looking to implement a Multidimensional Bloom filter index [1] [2] on
a Cassandra table.  OK, I know that is a lot to take in.  What I need is
any documentation that explains the architecture of the index options, or
someone I can ask questions of -- a mentor if you will.

I have a proof of concept for the index that works from the client side
[3].  What I want to do is move some of that processing to the server
side.

I basically I think I need to do the following:

   1. On each partition create an SST to store the index data.  This table
   comprises, 2 integer data points and the primary key for the data table.
   2. When the index cell gets updated in the original table (there will
   only be on column), update one or more rows in the SST table.
   3. When querying perform multiple queries against the index data, and
   return the primary key values (or the data associated with the primary keys
   -- I am unclear on this bit).

Any help or guidance would be appreciated,
Claude

[1] https://archive.org/details/arxiv-1501.01941/mode/2up
[2] https://archive.fosdem.org/2020/schedule/event/bloom_filters/
[3] https://github.com/Claude-at-Instaclustr/blooming_cassandra




-- 

[image: Instaclustr logo]


*Claude Warren*

Principal Software Engineer

Instaclustr

Can I create a secondary index with multiple SSTables

2021-12-09 Thread Claude Warren

I am building another secondary index implementation and am wondering if it
is possible to create two SSTable for the one index.  I assume I would
create two IndexCfs using the baseCfs for the table as shown in
org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata() but
would have to construct 2 different names for the call to
TableMetadata.builder().

Is there anything special I need to do for Cassandra to manage the SSTable
as per a normal index.

-- 

[image: Instaclustr logo]


*Claude Warren*

Principal Software Engineer

Instaclustr

Re: [DISCUSS] Periodic snapshot publishing with minor version bumps

2021-12-31 Thread Claude Warren

I am late to this party but wanted to add my 2-cents.

I do not think that the minor revisions should be used to denote snapshot,
nightly build, or any other not-fully-supported code.  My reasoning is that
semantic versioning defines under which conditions the version numbers are
to change.  By looking at the version I can tell if it is a bug fix or
added functionality change.  My experience, both with the Apache Jena
project, and at my places of employment, is that version numbers are best
left to identify what type of change is being offered.  If the package is
not a release package then it should have some sort of version extension
(e.g. -SNAPSHOT, -RC1, etc).

In the Jena project we do not release on a clock/calendar based schedule,
rather we decide that there is enough change in the product and that it is
sufficiently tested and then we go through a release.  Before that it is
always just a SNAPSHOT.

My take on all of this is that anything versioned x.y.x should be a fully
supported release.  That means fully tested, fully documented, packaging
tested, the works.  Fully meet the expectations of an Apache package.  Any
version that has other bits at the end should be noted in the site
documentation as being not fully baked and perhaps an explanation of how
poorly baked they are.

Keep in mind that making a release is a time consuming process.  More
releases mean more time spent preparing and supporting the releases, more
time answering questions about differences between releases.

So, for me, this boils down to two things:

   1. Keep the version numbers clean and use suffixes to identify
   non-standard releases and level set expectations for those releases.
   2. Keep it simple.  Don't plan lots of releases unless there are lots of
   people that want to do the packaging and support.  Remember that automation
   never made anything easier, it just moved the pain point somewhere else.

Claude

On Thu, 23 Dec 2021 at 00:28, bened...@apache.org 
wrote:

> > You were part of that slack thread, so it was a bad presumption on my
> behalf.
>
> I am flattered, but I’m sure your intention was in fact to involve
> everyone in this discussion. As it happens, I commented only on the end of
> that lengthy thread and did not participate in the section you linked, so
> was unaware of it – as I’m sure were most folk here.
>
>
>
> > the complaint raised by you is that it doesn't case-sensitively
> lexically order and undermines the proposal choice you want to see go
> forward
>
>
>
> Actually my complaint was more general, but I was letting another pet
> peeve of mine leak into this discussion. We should have a separate
> discussion around dependency policy in the new year. I think new
> dependencies should not be included without discussion on list, as they
> introduce significant new code to the project that is rarely audited even
> cursorily either on inclusion to the project or update. For such a trivial
> feature as this, that was adequately implemented in the project already, I
> consider the inclusion of a dependency to be a mistake.
>
>
>
> As it happens, I don’t think this problem you raise is a concern, even
> with this recently introduced faulty implementation of Semver. A2 is zero
> cost to implement, but even A1 would be fine without any work. It is
> unlikely we would ever need to compare a -pre version to -alpha or any
> other pre-release version, as we are unlikely to perform upgrade tests
> across these versions since we will have no users deploying them.
>
>
>
>
>
> *From: *Mick Semb Wever 
> *Date: *Wednesday, 22 December 2021 at 16:02
> *To: *
> *Cc: *dev@cassandra.apache.org 
> *Subject: *Re: [DISCUSS] Periodic snapshot publishing with minor version
> bumps
>
> > > Yeah, not described enough in this thread, it is part of the
> motivation to the proposal
> >
> > I don’t believe it has been mentioned once in this thread. This should
> have been clearly stated upfront as a motivation. Thus far no positive case
> has been made on this topic, we have instead wasted a lot of time
> discussing clearly peripheral topics, demonstrating that the more obvious
> approach for anyone without this motivation is indeed fine.
> >
>
>
> Apologies for not previously stating and explaining this additional
> motivation in this thread. You were part of that slack thread, so it
> was a bad presumption on my behalf.
>
>
> > > Setting up versioning to be extensible in this manner is not endorsing
> such artefacts and distributions.
> >
> > Yes, setting up versioning in this way with the intention of permitting
> comparisons between these “not Cassandra” releases and actual Cassandra
> releases is the same thing as endorsing this behaviour. It’s equally bad if
> this “internal” release is, say, used to support some cloud service that is
> advertised as Cassandra-compatible.
>
>
> We have many versionings at play, and they are used between codebases
> in our ecosystem. Forcing people to use their own versioning  may well

Re: [DISCUSS] Periodic snapshot publishing with minor version bumps

2022-01-03 Thread Claude Warren

Just to be very clear.

I think that bumping the semantic version number because some arbitrary
time constraint has passed is a bad idea.  Version numbers should be bumped
when dictated by the change to the code.  It makes sense that if you have
released 3.2.0 then the next nightly snapshot would be 3.3.0-SNAPSHOT (as
the assumption here is that you are adding functionality as we move
forward).  The Patch level changes should be reserved for cases where we
have to go back and fix something (Heartbleed, log4j issues come to mind).

nightly builds, quarterly snapshots, anything that is not a "release" can
be released at any time but should be listed as -.  For example 3.3.0-Q1_2022.  If desired
tags can be placed in git to show when those were built.   These are not
"release" builds.

A "release" build has full testing (regression, integration, etc.),
documentation checks, packaging verification, etc.  No release build has
anything after  the semantic version.

I am not that familiar with how C* has been built in the past, but if each
component has its own cadence then it might make sense for each component
to have its own semantic version and component release.  C* release then
assembles the component versions necessary and correct for the release and
builds its own release with its own semantic version number.  The C*
release would include a list of all the component versions that were used
to build the package.  In this case C* views each of the components as a
separate project.

The idea of bumping version numbers when there is no change to the
underlying code goes against the semantic versioning concept.  If there is
no change to the code, then there should be no change to the version.

That is my 2cents (now probably 5cents) worth.
Claude

On Fri, 31 Dec 2021 at 08:42, Claude Warren 
wrote:

> I am late to this party but wanted to add my 2-cents.
>
> I do not think that the minor revisions should be used to denote snapshot,
> nightly build, or any other not-fully-supported code.  My reasoning is that
> semantic versioning defines under which conditions the version numbers are
> to change.  By looking at the version I can tell if it is a bug fix or
> added functionality change.  My experience, both with the Apache Jena
> project, and at my places of employment, is that version numbers are best
> left to identify what type of change is being offered.  If the package is
> not a release package then it should have some sort of version extension
> (e.g. -SNAPSHOT, -RC1, etc).
>
> In the Jena project we do not release on a clock/calendar based schedule,
> rather we decide that there is enough change in the product and that it is
> sufficiently tested and then we go through a release.  Before that it is
> always just a SNAPSHOT.
>
> My take on all of this is that anything versioned x.y.x should be a fully
> supported release.  That means fully tested, fully documented, packaging
> tested, the works.  Fully meet the expectations of an Apache package.  Any
> version that has other bits at the end should be noted in the site
> documentation as being not fully baked and perhaps an explanation of how
> poorly baked they are.
>
> Keep in mind that making a release is a time consuming process.  More
> releases mean more time spent preparing and supporting the releases, more
> time answering questions about differences between releases.
>
> So, for me, this boils down to two things:
>
>1. Keep the version numbers clean and use suffixes to identify
>non-standard releases and level set expectations for those releases.
>2. Keep it simple.  Don't plan lots of releases unless there are lots
>of people that want to do the packaging and support.  Remember that
>automation never made anything easier, it just moved the pain point
>somewhere else.
>
> Claude
>
> On Thu, 23 Dec 2021 at 00:28, bened...@apache.org 
> wrote:
>
>> > You were part of that slack thread, so it was a bad presumption on my
>> behalf.
>>
>> I am flattered, but I’m sure your intention was in fact to involve
>> everyone in this discussion. As it happens, I commented only on the end of
>> that lengthy thread and did not participate in the section you linked, so
>> was unaware of it – as I’m sure were most folk here.
>>
>>
>>
>> > the complaint raised by you is that it doesn't case-sensitively
>> lexically order and undermines the proposal choice you want to see go
>> forward
>>
>>
>>
>> Actually my complaint was more general, but I was letting another pet
>> peeve of mine leak into this discussion. We should have a separate
>> discussion around dependency policy in the new year. I think new
>> dependencies should not be included without

CQL Tuples & CQL Grammar

2022-03-09 Thread Claude Warren

I have been looking at  CqlParser.g4 file for cql3  and have a question
about assignment tuples.  The assignment tuple is defined as :

assignmentTuple
   : syntaxBracketLr (
 constant ((syntaxComma constant)* | (syntaxComma assignmentTuple)*) |
 assignmentTuple (syntaxComma assignmentTuple)*
 ) syntaxBracketRr
   ;

which I read to be ( constant [, constant | tuple ... ]) or ( tuple [,
tuple...]) .  So the construct ((4 ,5 ), 6, (7, 8)) is not a legal tuple.2
questions:

   1.  Is my interpretation of the grammar correct?
   2. Is my example tuple supposed to be allowed?


Claude

-- 

[image: Instaclustr logo]


*Claude Warren*

Principal Software Engineer

Instaclustr

Re: CQL Tuples & CQL Grammar

2022-03-10 Thread Claude Warren

I found it in https://github.com/antlr/grammars-v4.git and I suspected it
was wrong.  I assume it should be that the tuple can contain 1 or more
elements and the elements may be of type  tuple, constant, map, list or
set.

Does that make sense?  I think that is what I saw in the code base.

On Thu, 10 Mar 2022 at 09:38, Benjamin Lerer  wrote:

> Hi Claude,
>
> I am not aware of the CqlParser.g4 file in our code base. Where did you
> find that file?
>
> At first glance effectively something looks wrong in the syntax. The
> construct ((4 ,5 ), 6, (7, 8)) should be legal in CQL.
>
> Le jeu. 10 mars 2022 à 06:50, Claude Warren 
> a écrit :
>
>> I have been looking at  CqlParser.g4 file for cql3  and have a question
>> about assignment tuples.  The assignment tuple is defined as :
>>
>> assignmentTuple
>>: syntaxBracketLr (
>>  constant ((syntaxComma constant)* | (syntaxComma assignmentTuple)*) 
>> |
>>  assignmentTuple (syntaxComma assignmentTuple)*
>>  ) syntaxBracketRr
>>;
>>
>> which I read to be ( constant [, constant | tuple ... ]) or ( tuple [,
>> tuple...]) .  So the construct ((4 ,5 ), 6, (7, 8)) is not a legal tuple.2
>> questions:
>>
>>1.  Is my interpretation of the grammar correct?
>>2. Is my example tuple supposed to be allowed?
>>
>>
>> Claude
>>
>> --
>>
>> [image: Instaclustr logo]
>>
>>
>> *Claude Warren*
>>
>> Principal Software Engineer
>>
>> Instaclustr
>>
>>
>>
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-31 Thread Claude Warren via dev

Is there enough support here for VIEWS to be the implementation strategy 
for displaying masking functions?


It seems to me the view would have to store the query and apply a where 
clause to it, so the same PK would be in play.


It has data leaking properties.

It has more use cases as it can be used to

 * construct views that filter out sensitive columns
 * apply transforms to convert units of measure

Are there more thoughts along this line?

Re: [DISCUSS] LWT UPDATE semantics with + and - when null

2022-08-31 Thread Claude Warren via dev

I like this approach.  However, in light of some of the discussions on 
view and the like perhaps the function is  (column value as returned by 
select ) + 42


So a null counter column becomes 0 before the update calculation is applied.

Then any null can be considered null unless addressed by IfNull(), or 
zeroIfNull()


Any operation on null returns null.

I think this follows what would be expected by most users in most cases.


On 31/08/2022 11:55, Andrés de la Peña wrote:
I think I'd prefer 2), the SQL behaviour. We could also get the 
convenience of 3) by adding CQL functions such as "ifNull(column, 
default)" or "zeroIfNull(column)", as it's done by other dbs. So we 
could do things like "UPDATE ... SET name = zeroIfNull(name) + 42".


On Wed, 31 Aug 2022 at 04:54, Caleb Rackliffe 
 wrote:


Also +1 on the SQL behavior here. I was uneasy w/ coercing to "" /
0 / 1 (depending on the type) in our previous discussion, but for
some reason didn't bring up the SQL analog :-|

On Tue, Aug 30, 2022 at 5:38 PM Benedict  wrote:

I’m a bit torn here, as consistency with counters is
important. But they are a unique eventually consistent data
type, and I am inclined to default standard numeric types to
behave as SQL does, since they write a new value rather than a
“delta”

It is far from optimal to have divergent behaviours, but also
suboptimal to diverge from relational algebra, and probably
special casing counters is the least bad outcome IMO.



On 30 Aug 2022, at 22:52, David Capwell 
wrote:


4.1 added the ability for LWT to support "UPDATE ... SET name
= name + 42", but we never really fleshed out with the larger
community what the semantics should be in the case where the
column or row are NULL; I opened up
https://issues.apache.org/jira/browse/CASSANDRA-17857 for
this issue.

As I see it there are 3 possible outcomes:
1) fail the query
2) null + 42 = null (matches SQL)
3) null + 42 == 0 + 42 = 42 (matches counters)

In SQL you get NULL (option 2), but CQL counters treat NULL
as 0 (option 3) meaning we already do not match SQL (though
counters are not a standard SQL type so might not be
applicable).  Personally I lean towards option 3 as the
"zero" for addition and subtraction is 0 (1 for
multiplication and division).

So looking for feedback so we can update in CASSANDRA-17857
before 4.1 release.

[DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-05 Thread Claude Warren via dev

I have just posted a CEP  covering an Enhancement for Sparse Data 
Serialzation.  This is in response to CASSANDRA-8959


I look forward to responses.

Re: [DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-05 Thread Claude Warren via dev

I am just learning the ropes here so perhaps it is not CEP worthy. That
being said, It felt like there was a lot of information to put into and
track in a ticket, particularly when I expected discussion about how to
best encode, changes to the algorithms etc. It feels like it would be
difficult to track. But if that is standard for this project I will move
the information there.

As to the benchmarking, I had thought that usage and performance
measures should be included. Thank you for calling out the subset of
data selected query as being of particular importance.

Claude

On 06/09/2022 03:11, Abe Ratnofsky wrote:

Looking at this link:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-23%3A++Enhancement+for+Sparse+Data+Serialization

Do you have any plans to include benchmarks in your test plan? It would be
useful to include disk usage / read performance / write performance comparisons
with the new encodings, particularly for sparse collections where a subset of
data is selected out of a collection.

I do wonder whether this is CEP-worthy. The CEP says that the changes will not
impact existing users, will be backwards compatible, and overall is an
efficiency improvement. The CEP guidelines say a CEP is encouraged “for
significant user-facing or changes that cut across multiple subsystems”. Any
reason why a Jira isn’t sufficient?

Abe

On Sep 5, 2022, at 1:57 AM, Claude Warren via dev
wrote:

I have just posted a CEP covering an Enhancement for Sparse Data Serialzation.
This is in response to CASSANDRA-8959

I look forward to responses.

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-09-07 Thread Claude Warren via dev


My vote is B

On 07/09/2022 13:12, Benedict wrote:
I’m not convinced there’s been adequate resolution over which approach 
is adopted. I know you have expressed a preference for the table 
schema approach, but the weight of other opinion so far appears to be 
against this approach - even if it is broadly adopted by other 
databases. I will note that Postgres does not adopt this approach, it 
has a more sophisticated security label approach that has not been 
proposed by anybody so far.


I think extra weight should be given to the implementer’s preference, 
so while I personally do not like the table schema approach, I am 
happy to accept this is an industry norm, and leave the decision to you.


However, we should ensure the community as a whole endorses this. I 
think an indicative poll should be undertaken first, eg:


A) We should implement the table schema approach, as proposed
B) We should prefer the view approach, but I am not opposed to the 
implementor selecting the table schema approach for this CEP
C) We should NOT implement the table schema approach, and should 
implement the view approach
D) We should NOT implement the table schema approach, and should 
implement some other scheme (or not implement this feature)


Where my vote is B


On 7 Sep 2022, at 12:50, Andrés de la Peña  wrote:


If nobody has more concerns regarding the CEP I will start the vote 
tomorrow.


On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña 
 wrote:


Is there enough support here for VIEWS to be the
implementation strategy for displaying masking functions?


I'm not sure that views should be "the" strategy for masking
functions. We have multiple approaches here:

1) CQL functions only. Users can decide to use the masking
functions on their own will. I think most dbs allow this pattern
of usage, which is quite straightforward. Obviously, it doesn't
allow admins to decide enforce users seeing only masked data.
Nevertheless, it's still useful for trusted database users
generating masked data that will be consumed by the end users of
the application.

2) Masking functions attached to specific columns. This way the
same queries will see different data (masked or not) depending on
the permissions of the user running the query. It has the
advantage of not requiring to change the queries that users with
different permissions run. The downside is that users would need
to query the schema if they need to know whether a column is
masked, unless we change the names of the returned columns. This
is the approach offered by Azure/SQL Server, PostgreSQL, IBM Db2,
Oracle, MariaDB/MaxScale and SnowFlake. All these databases
support applying the masking function to columns on the base
table, and some of them also allow to apply masking to views.

3) Masking functions as part of projected views. This ways users
might need to query the view appropriate for their permissions
instead of the base table. This might mean changing the queries
if the masking policy is changed by the admin. MySQL recommends
this approach on a blog entry, although it's not part of its main
documentation for data masking, and the implementation has
security issues. Some of the other databases offering the
approach 2) as their main option also support masking on view
columns.

Each approach has its own advantages and limitations, and I don't
think we necessarily have to choose. The CEP proposes
implementing 1) and 2), but no one impedes us to also have 3) if
we get to have projected views. However, I think that projected
views is a new general-purpose feature with its own complexities,
so it would deserve its own CEP, if someone is willing to work on
the implementation.



On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev
 wrote:

Is there enough support here for VIEWS to be the
implementation strategy for displaying masking functions?

It seems to me the view would have to store the query and
apply a where clause to it, so the same PK would be in play.

It has data leaking properties.

It has more use cases as it can be used to

  * construct views that filter out sensitive columns
  * apply transforms to convert units of measure

Are there more thoughts along this line?

Re: [DISCUSS] CEP-23: Enhancement for Sparse Data Serialization

2022-09-07 Thread Claude Warren via dev

I have looked through the code mentioned. What I found in the
ColumnSerializer was the use of VInt encoding. Are you proposing
switching directly to VInt encoding for sizes rather than one of the
other encodings? Using a -2 as the first length to signal that the new
encoding is in use so that existing encodings can be read unchanged?

On 06/09/2022 16:37, Benedict wrote:

So, looking more closely at your proposal I realise what you are trying to do.
The thing that threw me was your mention of lists and other collections. This
will likely not work as there is no index that is possible to define on a list
(or other collection) within a single sstable - a list is defined over the
whole on-disk contents, so the index is undefined within a given sstable.

Tuple and UDT are encoded inefficiently if there are many null fields, but this
is a very localised change, affecting just one class. You should take a look at
Columns.Serializer for code you can lift for encoding and decoding sparse
subsets of fields.

It might be that this can be switched on or off per sstable with a header flag
bit so that there is no additional cost for datasets that would not benefit.
Likely we can also migrate to vint encoding for the component sizes also (and
either 1 or 0 bytes for fixed width values), no doubt saving a lot of space
over the status quo, even for small UDT with few null entries.

Essentially at this point we’re talking about pushing through storage
optimisations applied elsewhere to tuples and UDT, which is a very
uncontroversial change.

On 6 Sep 2022, at 07:28, Benedict wrote:

I agree a Jira would suffice, and if visibility there required a DISCUSS
thread or simply a notice sent to the list.

While we’re here though, while I don’t have a lot of time to engage in
discussion it’s unclear to me what advantage this encoding scheme brings. It
might be worth outlining what algorithmic advantage you foresee for what data
distributions in which collection types.

On 6 Sep 2022, at 07:16, Claude Warren via dev wrote:

I am just learning the ropes here so perhaps it is not CEP worthy. That being
said, It felt like there was a lot of information to put into and track in a
ticket, particularly when I expected discussion about how to best encode,
changes to the algorithms etc. It feels like it would be difficult to track.
But if that is standard for this project I will move the information there.

As to the benchmarking, I had thought that usage and performance measures
should be included. Thank you for calling out the subset of data selected
query as being of particular importance.

Claude

On 06/09/2022 03:11, Abe Ratnofsky wrote:

Looking at this link:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-23%3A++Enhancement+for+Sparse+Data+Serialization

Abe

On Sep 5, 2022, at 1:57 AM, Claude Warren via dev
wrote:

I have just posted a CEP covering an Enhancement for Sparse Data Serialzation.
This is in response to CASSANDRA-8959

I look forward to responses.

Committer needed for Deprecate Throwables.propagate usage

2022-09-20 Thread Claude Warren via dev

I made the necessary fixes to remove the deprecated Throwables.propagate 
calls.  However, I need a committer to review.


https://issues.apache.org/jira/browse/CASSANDRA-14218

Thank you,

Claude

Weird results

2022-12-15 Thread Claude Warren, Jr via dev

I am working on a StandaloneDowngrader.java based on StandaloneUpgrader.java

While working on the tests I had a problem with 2 test (testFlagArgs and
testDefaultCall) that failed with:

ERROR [main] 2022-12-14 10:35:20,051 SSTableReader.java:496 - Cannot open
/home/claude/apache/cassandra/build/test/cassandra/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/nb-41-big;
partitioner org.apache.cassandra.dht.ByteOrderedPartitioner does not match
system partitioner org.apache.cassandra.dht.Murmur3Partitioner.  Note that
the default partitioner starting with Cassandra 1.2 is Murmur3Partitioner,
so you will need to edit that to match your old partitioner if upgrading.

The same tests failed in the StandaloneUpgraderTests on which the
StandaloneDowngraderTests are based.

After chatting with Jake I added code to set the partitioner using
DatabaseDescriptor.setPartitionerUsafe() and a try catch  block to make
sure it got reset in one test.  BOTH tests worked.

I then removed the just added code and both tests continued to work.

I restarted the IDE and both tests continued to work.

So I am not sure how adding and then removing code (including the include
statements) can make the tests work.  But I wanted to post this here so
that if there are other weird cases perhaps we can figure out what is
happening.

upgrade sstable selection

2023-01-10 Thread Claude Warren, Jr via dev

Greetings,

I am working on the downgradesstables code and seem to have a problem with
ordering of the downgrade or perhaps the Directories.SSTableLister

I lifted the code from upgradesstables to select the files to downgrade.
The only difference in the code that selects the files to downgrade is the
actual selection of the file.  There is no change to the ordering of the
files that are evaluated for inclusion.  Yet I think the downgrade ordering
is incorrect.

My process is to start 3.1 version to create the tables and then use the
4.0 code base to run the standaloneupgrader and then the
standalonedowngrader

When running the standaloneupgrader on system local I see the following
{{noformat}}
Found 3 sstables that need upgrading.
Upgrading
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-1-big-Data.db')
Upgrade of
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-1-big-Data.db')
complete.
Upgrading
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-2-big-Data.db')
Upgrade of
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-2-big-Data.db')
complete.
Upgrading
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-3-big-Data.db')
Upgrade of
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-3-big-Data.db')
complete.
{{noformat}}

when running the standalonedowngrader is see
{{noformat}}
Found 3 sstables that need downgrading.
Downgrading
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-6-big-Data.db')
Downgrade of
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-6-big-Data.db')
complete.
Downgrading
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-4-big-Data.db')
Downgrade of
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-4-big-Data.db')
complete.
Downgrading
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5-big-Data.db')
Downgrade of
BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5-big-Data.db')
complete.
{{noformat}}}

Note the order of the generations in the downgrader (I have seen similar
out of order issues with the upgrader, but infrequently)

The difference between the upgrader and downgrader code in the questionable
section (
https://github.com/Claudenw/cassandra/blob/CASSANDRA-8928/src/java/org/apache/cassandra/tools/StandaloneDowngrader.java#:~:text=new%20ArrayList%3C%3E()%3B-,//%20Downgrade%20sstables,%7D,-int%20numSSTables%20%3D)
is on line 101 where the files are selected and put into a list.  I think
this means that the Directories.SSTableLister on occasion returns files in
the incorrect order during a call to lister.list().entrySet()

I believe that the files are processed in the order specified and that the
generations get switched around.  This is evidenced by the file size of the
Data file associated with the generations as it moves through the process.
In this case we expect the nb-6 to become ma-7 as per the output from the
run.  In actuality we want nb-6 to be nb-9.

{{noformat}}
  ma   nb  ma
1  212 4  2128  212
2   64 5   709   70
3 4876 6 48837 4883
{{noformat}}

So now my question, has anyone seen the behaviour before?

Oh, to make things more interesting I am using Docker images of 3.1 and a
modified 4.0 that turns off the execution so I can just run the upgrade and
downgrade.

Any help would be appreciated,
Claude

Re: upgrade sstable selection

2023-01-10 Thread Claude Warren, Jr via dev

Actually since the Directories.SSTableLister stores the Components in a
HashMap indexed by the Descriptor.  Since the upgrade/downgrade code
retrieves the list in hash order there is no guarantee that they will be in
order.  I suspect that this is a bug.

On Tue, Jan 10, 2023 at 12:34 PM Brandon Williams  wrote:

> > I think this means that the Directories.SSTableLister on occasion
> returns files in the incorrect order during a call to
> lister.list().entrySet()
>
> This seems easy enough to verify by looping it and examining the results.
>
> Kind Regards,
> Brandon
>
> On Tue, Jan 10, 2023 at 4:44 AM Claude Warren, Jr via dev
>  wrote:
> >
> > Greetings,
> >
> > I am working on the downgradesstables code and seem to have a problem
> with ordering of the downgrade or perhaps the Directories.SSTableLister
> >
> > I lifted the code from upgradesstables to select the files to
> downgrade.  The only difference in the code that selects the files to
> downgrade is the actual selection of the file.  There is no change to the
> ordering of the files that are evaluated for inclusion.  Yet I think the
> downgrade ordering is incorrect.
> >
> > My process is to start 3.1 version to create the tables and then use the
> 4.0 code base to run the standaloneupgrader and then the
> standalonedowngrader
> >
> > When running the standaloneupgrader on system local I see the following
> > {{noformat}}
> > Found 3 sstables that need upgrading.
> > Upgrading
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-1-big-Data.db')
> > Upgrade of
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-1-big-Data.db')
> complete.
> > Upgrading
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-2-big-Data.db')
> > Upgrade of
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-2-big-Data.db')
> complete.
> > Upgrading
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-3-big-Data.db')
> > Upgrade of
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/ma-3-big-Data.db')
> complete.
> > {{noformat}}
> >
> > when running the standalonedowngrader is see
> > {{noformat}}
> > Found 3 sstables that need downgrading.
> > Downgrading
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-6-big-Data.db')
> > Downgrade of
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-6-big-Data.db')
> complete.
> > Downgrading
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-4-big-Data.db')
> > Downgrade of
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-4-big-Data.db')
> complete.
> > Downgrading
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5-big-Data.db')
> > Downgrade of
> BigTableReader(path='/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5-big-Data.db')
> complete.
> > {{noformat}}}
> >
> > Note the order of the generations in the downgrader (I have seen similar
> out of order issues with the upgrader, but infrequently)
> >
> > The difference between the upgrader and downgrader code in the
> questionable section (
> https://github.com/Claudenw/cassandra/blob/CASSANDRA-8928/src/java/org/apache/cassandra/tools/StandaloneDowngrader.java#:~:text=new%20ArrayList%3C%3E()%3B-,//%20Downgrade%20sstables,%7D,-int%20numSSTables%20%3D)
> is on line 101 where the files are selected and put into a list.  I think
> this means that the Directories.SSTableLister on occasion returns files in
> the incorrect order during a call to lister.list().entrySet()
> >
> > I believe that the files are processed in the order specified and that
> the generations get switched around.  This is evidenced by the file size of
> the Data file associated with the generations as it moves through the
> process.  In this case we expect the nb-6 to become ma-7 as per the output
> from the run.  In actuality we want nb-6 to be nb-9.
> >
> > {{noformat}}
> >   ma   nb  ma
> > 1  212 4  2128  212
> > 2   64 5   709   70
> > 3 4876 6 48837 4883
> > {{noformat}}
> >
> > So now my question, has anyone seen the behaviour before?
> >
> > Oh, to make things more interesting I am using Docker images of 3.1 and
> a modified 4.0 that turns off the execution so I can just run the upgrade
> and downgrade.
> >
> > Any help would be appreciated,
> > Claude
>

Re: [DISCUSS] Clear rules about sstable versioning and downgrade support

2023-01-16 Thread Claude Warren, Jr via dev

What does this mean for the Trie sstable format?

Would it perhaps make sense to version the sstable upgrader (and future
downgrader) based on the highest version they understand?  for example
sstableupgrader version N will handle the n? versions so it can upgrade
from m? while sstabledowngrader version N can downgrade from m? to
n, where  is the lower limit that downgrader can write.  In
this way users trying to upgrade from out of support major releases can use
multiple upgraders and downgraders to move forward and/or recover from a
failed upgrade.

In this case the sstablesupgrader and sstablesdowngrader should report any
tables that it can not handle and not execute the upgrade/downgrade.

On Fri, Jan 13, 2023 at 1:17 PM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Hi,
>
> I'd like to bring that topic to your attention. I think that we should
> think about allowing users to downgrade under certain conditions. For
> example, always allow for downgrading to any previous minor release.
>
> Clear rules should make users feel safer when upgrading and perhaps
> encourage trying Cassandra at all.
>
> One of the things related to that is sstable format version. It consists
> of major and minor components and is incremented independently from
> Cassandra releases. One rule here is that a Cassandra release producing
> sstables at version XY should be able to read any sstable with version
> (X-1)* and X* (which means that all the minor future versions X. Perhaps we
> could make some commitment to change major sstable format only with new
> major release?
>
> What do you think?
>
> Thanks
> - - -- --- -  -
> Jacek Lewandowski
>

Upgrading sstables and default partitioner.

2023-01-26 Thread Claude Warren, Jr via dev

Greetings,

I am working on porting a fix for table upgrade order into V3.0 and have
come across the following issue:

ERROR 10:23:31 Cannot open
/home/claude/apache/cassandra/build/test/cassandra/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/me-89-big;
partitioner org.apache.cassandra.dht.ByteOrderedPartitioner does not match
system partitioner org.apache.cassandra.dht.Murmur3Partitioner.  Note that
the default partitioner starting with Cassandra 1.2 is Murmur3Partitioner,
so you will need to edit that to match your old partitioner if upgrading.

Now I know that this can be corrected by setting the default partitioner in
the test code for later versions but in 3.0 we are simply calling the
bin/sstableupgrade script.  This got me wondering.


   1. Should the upgrade fail if the partitioner is different?  I think
   that the partitioner should simply upgrade the format and leave the
   specified partitioner as it is.  If we need to change the partitioner  then
   we need a way to do it with a command line/environment option for the
   sstableupgrade to function.
   2. At what point did the system move from being ByteOrderd to Murmur3?
   Wouldn't the upgradetables script have failed at that point?
   3. #2 leads me to ask, who uses the upgradetables script?  Since later
   Cassandra versions can read earlier versions it must only be used when
   skipping entire versions.

Discussion/solutions for these questions/problems is greatly appreciated,
Claude

Re: [DISCUSSION] Cassandra's code style and source code analysis

2023-01-27 Thread Claude Warren, Jr via dev

Turn it on at warning (or lower) level now, so people have some idea of the
size of change to their current code.

On Wed, Jan 25, 2023 at 12:05 PM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Thank you Maxim for doing this.
>
> It is nice to see this effort materialized in a PR.
>
> I would wait until bigger chunks of work are committed to trunk (like
> CEP-15) to not collide too much. I would say we can postpone doing this
> until the actual 5.0 release, last weeks before it so we would not clash
> with any work people would like to include in 5.0. This can go in anytime,
> basically.
>
> Are people on the same page?
>
> Regards
>
> 
> From: Maxim Muzafarov 
> Sent: Monday, January 23, 2023 19:46
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> Hello everyone,
>
> You can find the changes here:
> https://issues.apache.org/jira/browse/CASSANDRA-17925
>
> While preparing the code style configuration for the Eclipse IDE, I
> discovered that there was no easy way to have complex grouping options
> for the set of packages. So we need to add extra blank lines between
> each group of packages so that all the configurations for Eclipse,
> NetBeans, IntelliJ IDEA and checkstyle are aligned. I should have
> checked this earlier for sure, but I only did it for static imports
> and some groups, my bad. The resultant configuration looks like this:
>
> java.*
> [blank line]
> javax.*
> [blank line]
> com.*
> [blank line]
> net.*
> [blank line]
> org.*
> [blank line]
> org.apache.cassandra.*
> [blank line]
> all other imports
> [blank line]
> static all other imports
>
> The pull request is here:
> https://github.com/apache/cassandra/pull/2108
>
> The configuration-related changes are placed in a dedicated commit, so
> it should be easy to make a review:
>
> https://github.com/apache/cassandra/pull/2108/commits/84e292ddc9671a0be76ceb9304b2b9a051c2d52a
>
> 
>
> Another important thing to mention is that the total amount of changes
> for organising imports is really big (more than 2000 files!), so we
> need to decide the right time to merge this PR. Although rebasing or
> merging changes to development branches should become much easier
> ("Accept local" + "Organize imports"), we still need to pay extra
> attention here to minimise the impact on major patches for the next
> release.
>
> On Mon, 16 Jan 2023 at 13:16, Maxim Muzafarov  wrote:
> >
> > Stefan,
> >
> > Thank you for bringing this topic up. I'll prepare the PR shortly with
> > option 4, so everyone can take a look at the amount of changes. This
> > does not force us to go exactly this path, but it may shed light on
> > changes in general.
> >
> > What exactly we're planning to do in the PR:
> >
> > 1. Checkstyle AvoidStarImport rule, so no star imports will be allowed.
> > 2. Checkstyle ImportOrder rule, for controlling the order.
> > 3. The IDE code style configuration for Intellij IDEA, NetBeans, and
> > Eclipse (it doesn't exist for Eclipse yet).
> > 4. The import order according to option 4:
> >
> > ```
> > java.*
> > javax.*
> > [blank line]
> > com.*
> > net.*
> > org.*
> > [blank line]
> > org.apache.cassandra.*
> > [blank line]
> > all other imports
> > [blank line]
> > static all other imports
> > ```
> >
> >
> >
> > On Mon, 16 Jan 2023 at 12:39, Miklosovic, Stefan
> >  wrote:
> > >
> > > Based on the voting we should go with option 4?
> > >
> > > Two weeks passed without anybody joining so I guess folks are all
> happy with that or this just went unnoticed?
> > >
> > > Let's give it time until the end of this week (Friday 12:00 UTC).
> > >
> > > Regards
> > >
> > > 
> > > From: Maxim Muzafarov 
> > > Sent: Tuesday, January 3, 2023 14:31
> > > To: dev@cassandra.apache.org
> > > Subject: Re: [DISCUSSION] Cassandra's code style and source code
> analysis
> > >
> > > NetApp Security WARNING: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
> > >
> > >
> > >
> > >
> > > Folks,
> > >
> > > Let me update the voting status and put together everything we have so
> > > far. We definitely need more votes to have a solid foundation for this
> > > change, so I encourage everyone to consider the options above and
> > > share them in this thread.
> > >
> > >
> > > Total for each applicable option:
> > >
> > > 4-th option -- 4 votes
> > > 3-rd option -- 3 votes
> > > 5-th option -- 1 vote
> > > 1-st option -- 0 votes
> > > 2-nd option -- 0 votes
> > >
> > > On Thu, 22 Dec 2022 at 22:06, Mick Semb Wever  wrote:
> > > >>
> > > >>
> > > >> 3. Total 5 groups, 2968 files to change
> > > >>
> > > >> ```
> > > >> org.apache.cassandra.*
> > >

Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

2023-01-30 Thread Claude Warren, Jr via dev

Actually, Maxim's proposal does not depend on JMX being present or not.
What the proposal does is make it easier to create/sync multiple
presentations of the same internal data:  Virtual Tables, JMX, Metrics,
next year's  greatest data presentation strategy.  Removing JMX from the
mix just reduces the number of implementations, it does not in any way
invalidate or change the proposal.

On Mon, Jan 30, 2023 at 11:27 AM Ekaterina Dimitrova 
wrote:

> +1 on starting that discussion, thank you for volunteering Benjamin!
>
> On Mon, 30 Jan 2023 at 4:59, Benjamin Lerer  wrote:
>
>> It seems to me that the question that we need to answer, before Maxim
>> puts more effort into this work, is: what is the future of JMX?
>> I agree that we have never been clear about that decision up to now. At
>> least now we have a reason to clarify that point.
>>
>> I can start a discussion about that if people agree?
>>
>> Le dim. 29 janv. 2023 à 10:14, Dinesh Joshi  a écrit :
>>
>>> I’m also very interested in this area. I quickly skimmed the proposal
>>> and IIUC it doesn’t call for moving away from JMX. Instead I think it’s
>>> making it easier to expose metrics over various interfaces. Maxim please
>>> correct me if I’m wrong in my understanding.
>>>
>>> I also second Josh’s point on JMX usage. I know it’s disabled in some
>>> deployments but it is not common practice in most places.
>>>
>>> Dinesh
>>>
>>> On Jan 28, 2023, at 4:10 PM, Josh McKenzie  wrote:
>>>
>>> 
>>> First off - thanks so much for putting in this effort Maxim! This is
>>> excellent work.
>>>
>>> Some thoughts on the CEP and responses in thread:
>>>
>>> *Considering that JMX is usually not used and disabled in production
>>> environments for various performance and security reasons, the operator may
>>> not see the same picture from various of Dropwizard's metrics exporters and
>>> integrations as Cassandra's JMX metrics provide [1][2].*
>>>
>>> I don't think this assertion is true. Cassandra is running in a *lot*
>>> of places in the world, and JMX has been in this ecosystem for a long time;
>>> we need data that is basically impossible to get to claim "JMX is usually
>>> not used in C* environments in prod".
>>>
>>> I also wonder about if we should care about JMX?  I know many wish to
>>> migrate (its going to be a very long time) away from JMX, so do we need a
>>> wrapper to make JMX and vtables consistent?
>>>
>>> If we can move away from a bespoke vtable or JMX based implementation
>>> and instead have a templatized solution each of these is generated from,
>>> that to me is the superior option. There's little harm in adding new JMX
>>> endpoints (or hell, other metrics framework integration?) as a byproduct of
>>> adding new vtable exposed metrics; we have the same maintenance obligation
>>> to them as we have to the vtables and if it generates from the same base
>>> data, we shouldn't have any further maintenance burden due to its presence
>>> right?
>>>
>>> we wish to move away from JMX
>>>
>>> I do, and you do, and many people do, but I don't believe *all* people
>>> on the project do. The last time this came up in slack the conclusion was
>>> "Josh should go draft a CEP to chart out a path to moving off JMX while
>>> maintaining backwards-compat w/existing JMX metrics for environments that
>>> are using them" (so I'm excited to see this CEP pop up before I got to it!
>>> ;)). Moving to a system that gives us a 0-cost way to keep JMX and vtable
>>> in sync over time on new metrics seems like a nice compromise for folks
>>> that have built out JMX-based maintenance infra on top of C*. Plus removing
>>> the boilerplate toil on vtables. win-win.
>>>
>>> If we add a column to the end of the JMX row did we just break users?
>>>
>>> I *think* this is arguably true for a vtable / CQL-based solution as
>>> well from the "you don't know how people are using your API" perspective.
>>> Unless we have clear guidelines about discretely selecting the columns you
>>> want from a vtable and trust users to follow them, if people have brittle
>>> greedy parsers pulling in all data from vtables we could very well break
>>> them as well by adding a new column right? Could be wrong here; I haven't
>>> written anything that consumes vtable metric data and maybe the obvious
>>> idiom in the face of that is robust in the presence of column addition.
>>> /shrug
>>>
>>> It's certainly more flexible and simpler to write to w/out detonating
>>> compared to JMX, but it's still an API we'd be revving.
>>>
>>> On Sat, Jan 28, 2023, at 4:24 PM, Ekaterina Dimitrova wrote:
>>>
>>> Overall I have similar thoughts and questions as David.
>>>
>>> I just wanted to add a reminder about this thread from last summer[1].
>>> We already have issues with the alignment of JMX and Settings Virtual
>>> Table. I guess this is how Maxim got inspired to suggest this framework
>>> proposal which I want to thank him for! (I noticed he assigned
>>> CASSANDRA-15254)
>>>
>>> Not to open the Pandora box

Re: Cassandra CI Status 2023-01-07

2023-02-10 Thread Claude Warren, Jr via dev

New Failures from Build Lead Week 5

*** CASSANDRA-18198 - "AttributeError: module 'py' has no attribute 'io'"
reported in multiple tests
- reported in 4.1, 3.11, and 3.0
- identified as a possible class loader issue associated with
CASSANDRA-18150

*** CASSANDRA-18191 - Native Transport SSL tests failing
- TestNativeTransportSSL.test_connect_to_ssl and
TestNativeTransportSSL.test_connect_to_ssl (novnode)
- TestNativeTransportSSL.test_connect_to_ssl_optional and
TestNativeTransportSSL.test_connect_to_ssl_optional (nvnode)


On Mon, Jan 23, 2023 at 10:10 PM Caleb Rackliffe 
wrote:

> New failures from Build Lead Week 4:
>
> *** CASSANDRA-18188 - Test failure in
> upgrade_tests.cql_tests.cls.test_limit_ranges
> - trunk
> - AttributeError: module 'py' has no attribute 'io'
>
> *** CASSANDRA-18189 - Test failure in
> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_with_timeouts
> - 4.0
> - assert 10 == 94764
> - other failures currently open in this test class, but at least
> superficially, different errors (see CASSANDRA-17322, CASSANDRA-18162)
>
> Timeouts continue to manifest in many places.
>
> On Sun, Jan 15, 2023 at 6:02 AM Mick Semb Wever  wrote:
>
>> *** The Butler (Build Lead)
>>>
>>> The introduction of Butler and the Build Lead was a wonderful
>>> improvement to our CI efforts.  It has brought a lot of hygiene in
>>> listing out flakies as they happened.  Noted that this has in-turn
>>> increased the burden in getting our major releases out, but that's to
>>> be seen as a one-off cost.
>>>
>>
>>
>> New Failures from Build Lead Week 3.
>>
>>
>> *** CASSANDRA-18156
>> – 
>> repair_tests.deprecated_repair_test.TestDeprecatedRepairNotifications.test_deprecated_repair_error_notification
>>  - AssertionError: Node logs don't have an error message for the failed
>> repair
>>  - hard regression
>>  - 3.0, 3.11,
>>
>> *** CASSANDRA-18164 – CASTest Message serializedSize(12) does not match
>> what was written with serialize(out, 12) for verb
>> PAXOS2_COMMIT_AND_PREPARE_RSP
>>  - serializer class org.apache.cassandra.net.Message$Serializer;
>> expected 1077, actual 1079
>>  - 4.1, trunk
>>
>> *** CASSANDRA-18158
>> – 
>> org.apache.cassandra.distributed.upgrade.MixedModeReadTest.mixedModeReadColumnSubsetDigestCheck
>>  - Cannot achieve consistency level ALL
>>  - 3.11, trunk
>>
>> *** CASSANDRA-18159 – repair_tests.repair_test.TestRepair.test_*dc_repair
>>   - AssertionError: null
>> in MemtablePool$SubPool.released(MemtablePool.java:193)
>>  - 3.11, 4.0, 4.1, trunk
>>
>> *** CASSANDRA-18160
>> – 
>> cdc_test.TestCDC.test_insertion_and_commitlog_behavior_after_reaching_cdc_total_space
>>  - Found orphaned index file in after CDC state not in former
>>  - 4.1, trunk
>>
>> *** CASSANDRA-18161 –
>>  
>> org.apache.cassandra.transport.CQLConnectionTest.handleCorruptionOfLargeMessageFrame
>>  - AssertionFailedError in
>> CQLConnectionTest.testFrameCorruption(CQLConnectionTest.java:491)
>>  - 4.0, 4.1, trunk
>>
>> *** CASSANDRA-18162 –
>> cqlsh_tests.test_cqlsh_copy.TestCqlshCopy.test_bulk_round_trip_non_prepared_statements
>> - Inet address 127.0.0.3:7000 is not available: [Errno 98] Address
>> already in use
>> - 3.0, 3.11, 4.0, 4.1, trunk
>>
>> *** CASSANDRA-18163 –
>>  
>> transient_replication_test.TestTransientReplicationRepairLegacyStreaming.test_speculative_write_repair_cycle
>>  - AssertionError Incoming stream entireSSTable
>>  - 4.0, 4.1, trunk
>>
>>
>> While writing these up, some thoughts…
>>  - While Butler reports failures against multiple branches, there's no
>> feedback/sync that the ticket needs its fixVersions updated when failures
>> happen in other branches after the ticket is created.
>>  - In 4.0 onwards, a majority of the failures are timeouts (>900s),
>> reinforcing that the current main problem we are facing in ci-cassandra.a.o
>> is saturation/infra
>>
>>
>>
>>
>>

downgrade sstables

2023-02-20 Thread Claude Warren, Jr via dev

I have been working on downgrading sstables for awhile now.  I have the
downgrader mostly working.  The only issue is when downgrading
system tables.

Specifically during the 3.1 -> 4.0 changes a column broadcast_port was
added to system/local.  This means that 3.1 system can not read the table
as it has no definition for it.  I tried marking the column for deletion in
the metadata and in the serialization header.  The later got past the
column not found problem, but I suspect that it just means that data
columns after broadcast_port shifted and so incorrectly read.

I tried to run scrub to see if it would clean up the issues.  No luck.  I
suspect I need to rewrite data columns and remove the broadcast_port
altogether.

Does anyone have a suggestion for how to approach this problem?  Is
rewriting the sstable the only solution?  Can anyone point me to
example code for forcing the rewrite in an offline system?

Thanks,
Claude

Re: Downgradability

2023-02-21 Thread Claude Warren, Jr via dev

My goal in implementing CASSANDRA-8928

was
to be able to take the current version 4.x and write it as the earliest 3.x
version possible.  The reasoning being that if that was possible then
whatever 3.x version was executed would be able to automatically read the
early 3.x version.  My thought was that each release version would have the
ability to downgrade to the earliest previous version.  In this way if
users need to they could string together a number of downgrader versions to
move from 5.x to 3.x.

My testing has been pretty straightforward, I created 4 docker containers
using the standard published Cassandra docker images for 3.1 and 4.0 with
data mounted on an external drive.  two of the containers (one of each
version) did not automatically start Cassandra.  My process was then:

   1. start and stop Cassandra 4.0 to create the default data files
   2. start the Cassandra 4.0 container that does not automatically run
   Cassandra and execute the new downgrade functionality.
   3. start the Cassandra 3.1 container and dump the logs.  If the system
   started then I knew that I at least had a proof of concept.  So far no-go.



On Tue, Feb 21, 2023 at 8:57 AM Branimir Lambov <
branimir.lam...@datastax.com> wrote:

> It appears to me that the first thing we need to start this feature off is
> a definition of a suite of tests together with a set of rules to keep the
> suite up to date with new features as they are introduced. The moment that
> suite is in place, we can start having some confidence that we can enforce
> downgradability.
>
> Something like this will definitely catch incompatibilities in SSTable
> formats (such as the one in CASSANDRA-17698 that I managed to miss during
> review), but will also be able to identify incompatible system schema
> changes among others, and at the same time rightfully ignore non-breaking
> changes such as modifications to the key cache serialization formats.
>
> Is downgradability in scope for 5.0? It is a feature like any other, and I
> don't see any difficulty adding it (with support for downgrade to 4.x) a
> little later in the 5.x timeline.
>
> Regards,
> Branimir
>
> On Tue, Feb 21, 2023 at 9:40 AM Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
>> I'd like to mention CASSANDRA-17056 (CEP-17) here as it aims to introduce
>> multiple sstable formats support. It allows for providing an implementation
>> of SSTableFormat along with SSTableReader and SSTableWriter. That could be
>> extended easily to support different implementations for certain version
>> ranges, like one impl for ma-nz, other for oa+, etc. without having a
>> confusing implementation with a lot of conditional blocks. Old formats in
>> such case could be maintained separately from the main code and easily
>> switched any time.
>>
>> thanks
>> - - -- --- -  -
>> Jacek Lewandowski
>>
>>
>> wt., 21 lut 2023 o 01:46 Yuki Morishita  napisał(a):
>>
>>> Hi,
>>>
>>> What I wanted to address in my comment in CASSANDRA-8110(
>>> https://issues.apache.org/jira/browse/CASSANDRA-8110?focusedCommentId=17641705&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17641705)
>>> is to focus on better upgrade experience.
>>>
>>> Upgrading the cluster can be painful for some orgs with mission critical
>>> Cassandra cluster, where they cannot tolerate less availability because of
>>> the inability to replace the downed node.
>>> They also need to plan rolling back to the previous state when something
>>> happens along the way.
>>> The change I proposed in CASSANDRA-8110 is to achieve the goal of at
>>> least enabling SSTable streaming during the upgrade by not upgrading the
>>> SSTable version. This can make the cluster to easily rollback to the
>>> previous version.
>>> Downgrading SSTable is not the primary focus (though Cassandra needs to
>>> implement the way to write SSTable in older versions, so it is somewhat
>>> related.)
>>>
>>> I'm preparing the design doc for the change.
>>> Also, if I should create a separate ticket from CASSANDRA-8110 for the
>>> clarity of the goal of the change, please let me know.
>>>
>>>
>>> On Tue, Feb 21, 2023 at 5:31 AM Benedict  wrote:
>>>
 FWIW I think 8110 is the right approach, even if it isn’t a panacea. We
 will have to eventually also tackle system schema changes (probably not
 hard), and may have to think a little carefully about other things, eg with
 TTLs the format change is only the contract about what values can be
 present, so we have to make sure the data validity checks are consistent
 with the format we write. It isn’t as simple as writing an earlier version
 in this case (unless we permit truncating the TTL, perhaps)

 On 20 Feb 2023, at 20:24, Benedict  wrote:


 
 In a self-organising community, things that aren’t self-policed
 naturally end up policed in an adho

Removing columns from sstables

2023-02-22 Thread Claude Warren, Jr via dev

Greetings,

I have been looking through the code and I can't find any place where
columns are removed from an sstable.   I have found that rows can be
deleted.  Columns can be marked as deleted.  But I have found no place
where the deleted cell is removed from the row.  Is there the concept of
completely removing all traces of the column from the table?

The specific case I am working on is downgrading v4.x system.local table to
v3.1 format.  This involves the removal of the broadcast_port column so
that the hardcoded definition of the v3.1 table can read the sstable from
disk.

Any assistance or pointers would be appreciated,
Claude

Re: Removing columns from sstables

2023-02-22 Thread Claude Warren, Jr via dev

Close.  It is still in the table so the v3.x code that reads system.local
will detect it and fail on an unknown column as that code appears to be
looking at the actual on-disk format.

It sounds like the short answer is that there is no way to physically
remove the column from the on-disk format once it is added.

On Wed, Feb 22, 2023 at 11:28 AM Erick Ramirez 
wrote:

> When a column is dropped from a table, it is added to the
> system.dropped_columns table so it doesn't get returned in the results. Is
> that what you mean? 🙂
>
>>

Re: Downgradability

2023-02-23 Thread Claude Warren, Jr via dev

Broken downgrading can be fixed (I think) by modifying the
SearializationHeader.toHeader() method where it currently throws an
UnknownColumnException.  If we can, instead of throwing the exception,
create a dropped column for the unexpected column then I think the code
will work.

I realise that to do this in the wild is not possible as it is a change to
released code, but we could handle it going forward.

On Wed, Feb 22, 2023 at 11:21 PM Henrik Ingo 
wrote:

> ... ok apparently shift+enter  sends messages now?
>
> I was just saying if at least the file format AND system/tables - anything
> written to disk - can be protected with a switch, then it allows for quick
> downgrade by shutting down the entire cluster and restarting with the
> downgraded binary. It's a start.
>
> To be able to do that live in a distributed system needs to consider much
> more: gossip, streaming, drivers, and ultimately all features, because we
> don't' want an application developer to use a shiny new thing that a) may
> not be available on all nodes, or b) may disappear if the cluster has to be
> downgraded later.
>
> henrik
>
> On Thu, Feb 23, 2023 at 1:14 AM Henrik Ingo 
> wrote:
>
>> Just this once I'm going to be really brief :-)
>>
>> Just wanted to share for reference how Mongodb implemented
>> downgradeability around their 4.4 version:
>> https://www.mongodb.com/docs/manual/release-notes/6.0-downgrade-sharded-cluster/
>>
>> Jeff you're right. Ultimately this is about more than file formats.
>> However, ideally if at least the
>>
>> On Mon, Feb 20, 2023 at 10:02 PM Jeff Jirsa  wrote:
>>
>>> I'm not even convinced even 8110 addresses this - just writing sstables
>>> in old versions won't help if we ever add things like new types or new
>>> types of collections without other control abilities. Claude's other email
>>> in another thread a few hours ago talks about some of these surprises -
>>> "Specifically during the 3.1 -> 4.0 changes a column broadcast_port was
>>> added to system/local.  This means that 3.1 system can not read the table
>>> as it has no definition for it.  I tried marking the column for deletion in
>>> the metadata and in the serialization header.  The later got past the
>>> column not found problem, but I suspect that it just means that data
>>> columns after broadcast_port shifted and so incorrectly read." - this is a
>>> harder problem to solve than just versioning sstables and network
>>> protocols.
>>>
>>> Stepping back a bit, we have downgrade ability listed as a goal, but
>>> it's not (as far as I can tell) universally enforced, nor is it clear at
>>> which point we will be able to concretely say "this release can be
>>> downgraded to X".   Until we actually define and agree that this is a
>>> real goal with a concrete version where downgrade-ability becomes real, it
>>> feels like things are somewhat arbitrarily enforced, which is probably very
>>> frustrating for people trying to commit work/tickets.
>>>
>>> - Jeff
>>>
>>>
>>>
>>> On Mon, Feb 20, 2023 at 11:48 AM Dinesh Joshi  wrote:
>>>
 I’m a big fan of maintaining backward compatibility. Downgradability
 implies that we could potentially roll back an upgrade at any time. While I
 don’t think we need to retain the ability to downgrade in perpetuity it
 would be a good objective to maintain strict backward compatibility and
 therefore downgradability until a certain point. This would imply
 versioning metadata and extending it in such a way that prior version(s)
 could continue functioning. This can certainly be expensive to implement
 and might bloat on-disk storage. However, we could always offer an option
 for the operator to optimize the on-disk structures for the current version
 then we can rewrite them in the latest version. This optimizes the storage
 and opens up new functionality. This means new features that can work with
 old on-disk structures will be available while others that strictly require
 new versions of the data structures will be unavailable until the operator
 migrates to the new version. This migration IMO should be irreversible.
 Beyond this point the operator will lose the ability to downgrade which is
 ok.

 Dinesh

 On Feb 20, 2023, at 10:40 AM, Jake Luciani  wrote:

 
 There has been progress on

 https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8928

 Which is similar to what datastax does for DSE. Would this be an
 acceptable solution?

 Jake

 On Mon, Feb 20, 2023 at 11:17 AM guo Maxwell 
 wrote:

> It seems “An alternative solution is to implement/complete
> CASSANDRA-8110 ”
> can give us more options if it is finished😉
>
> Branimir Lambov 于2023年2月20日 周一下午11:03写道：
>
>> Hi everyone,
>>
>> There has been a discussion lately about changes to the sstable
>> format in t

[DISCUSS] Single boilerplate script

2023-02-23 Thread Claude Warren, Jr via dev

Pull request https://github.com/apache/cassandra/pull/1950/files is an
attempt to move the boilerplate coding from the script files into a single
maintainable file.

This change does 4 things:

   1. Moves the standard boiler plate from the standard scripts into a
   single maintainable script to be sourced in the original scripts with
   minimal changes.
   2. Creates a debug logging so that problem determination (as it
   CASSANDRA-17773 )
   is easier.
   3. Has code to preserve environment variables when needed (c.f. nodetool
   script).
   4. Provides a verification method that will verify that all standard
   environment variables are set.

In practice this is a simple 2 line replacement for most boilerplate
blocks.  Examples of the simple case (sstableloader) and a more complex
case (nodetool) are included in the current pull request.

If there is consensus, I will update the other scripts in the bin directory
to utilize the sourced boilerplate and request a full review of the pull
request.

Re: Downgradability

2023-02-23 Thread Claude Warren, Jr via dev

You also need to remove the system.local.broadcast_port column as that does
not exist in the earlier version and when the earlier version attempts to
read it the code throws an UnknownColumnException.

On Thu, Feb 23, 2023 at 11:27 AM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Running upgrade tests backwards is great idea which does not require extra
> work.
>
> For stats metadata it already supports writing in previous serialization
> version
>
> We need a small fix in compression metadata and that's it.
>
> A flag with the write format version is probably LHF.
>
> Maybe let's try,  we still have time to fix it before 5.0
>
>
> czw., 23 lut 2023, 10:57 użytkownik Benedict 
> napisał:
>
>> Forget downgradeability for a moment: we should not be breaking format
>> compatibility without good reason. Bumping a major version isn’t enough of
>> a reason.
>>
>> Can somebody explain to me why this is being fought tooth and nail, when
>> the work involved is absolutely minimal?
>>
>> Regarding tests: what more do you want, than running our upgrade suite
>> backwards?
>>
>>
>> On 23 Feb 2023, at 09:45, Benjamin Lerer  wrote:
>>
>> 
>>
>>> Can somebody explain to me what is so burdensome, that we seem to be
>>> spending longer debating it than it would take to implement the necessary
>>> changes?
>>
>>
>> I believe that we all agree on the outcome. Everybody wants
>> downgradability. The issue is on the path to get there.
>>
>> As far as I am concerned, I would like to see a proper solution and as
>> Jeff suggested the equivalent of the upgrade tests as gatekeepers. Having
>> everybody trying to enforce it on his own way will only lead to a poor
>> result in my opinion with some addoc code that might not really guarantee
>> real downgradability in the end.
>> We have rushed in the past to get feature outs and pay the price for it.
>> I simply prefer that we take the time to do things right.
>>
>> Thanks to Scott and you, downgradability got a much better visibility so
>> no matter what approach we pick, I am convinced that we will get there.
>>
>> Le jeu. 23 févr. 2023 à 09:49, Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> a écrit :
>>
>>> Broken downgrading can be fixed (I think) by modifying the
>>> SearializationHeader.toHeader() method where it currently throws an
>>> UnknownColumnException.  If we can, instead of throwing the exception,
>>> create a dropped column for the unexpected column then I think the code
>>> will work.
>>>
>>> I realise that to do this in the wild is not possible as it is a
>>> change to released code, but we could handle it going forward.
>>>
>>> On Wed, Feb 22, 2023 at 11:21 PM Henrik Ingo 
>>> wrote:
>>>
>>>> ... ok apparently shift+enter  sends messages now?
>>>>
>>>> I was just saying if at least the file format AND system/tables -
>>>> anything written to disk - can be protected with a switch, then it allows
>>>> for quick downgrade by shutting down the entire cluster and restarting with
>>>> the downgraded binary. It's a start.
>>>>
>>>> To be able to do that live in a distributed system needs to consider
>>>> much more: gossip, streaming, drivers, and ultimately all features, because
>>>> we don't' want an application developer to use a shiny new thing that a)
>>>> may not be available on all nodes, or b) may disappear if the cluster has
>>>> to be downgraded later.
>>>>
>>>> henrik
>>>>
>>>> On Thu, Feb 23, 2023 at 1:14 AM Henrik Ingo 
>>>> wrote:
>>>>
>>>>> Just this once I'm going to be really brief :-)
>>>>>
>>>>> Just wanted to share for reference how Mongodb implemented
>>>>> downgradeability around their 4.4 version:
>>>>> https://www.mongodb.com/docs/manual/release-notes/6.0-downgrade-sharded-cluster/
>>>>>
>>>>> Jeff you're right. Ultimately this is about more than file formats.
>>>>> However, ideally if at least the
>>>>>
>>>>> On Mon, Feb 20, 2023 at 10:02 PM Jeff Jirsa  wrote:
>>>>>
>>>>>> I'm not even convinced even 8110 addresses this - just writing
>>>>>> sstables in old versions won't help if we ever add things like new types 
>>>>>> or
>>>>>> new types of

[DISCUSS] Moving standard boiler plate script blocks.

2023-03-22 Thread Claude Warren, Jr via dev

I would like to get some more eyes on
https://github.com/apache/cassandra/pull/1950/files wich arises from
CASSANDRA-17773

The basic idea is to:

   - Move the boiler plate script code to a single sourced file.
   - Add code to make debugging scripts easier, this in response to
   CASSANDRA-17773

The pull is a WIP as it shows the new sourced file and how it would be
included in the production scripts.  Two examples are provided.

If it is agreed that this is the approach we would like to take I will
complete the updates of the remaining files.

Your input is appreciated,
Claude

[DISCUSS] Initial implementation of cassandra-conf with nodetool example

2023-04-17 Thread Claude Warren, Jr via dev

The pull request [1] is a proposed fix for CASSANDRA-17773.  I am looking
for comments and a decision as to whether to move forward or not with this
change.

The goal is to remove much of the boiler-plate code from scripts without
changing their functionality or arguments and to add the ability to debug
the configuration settings on a running system.

Please take a look and comment,
Claude

[1] https://github.com/apache/cassandra/pull/1950

[COMPRESSION PARAMETERS] Question

2023-04-19 Thread Claude Warren, Jr via dev

Currently the compression parameters has an option called enable.  When
enable=false all the other options have to be removed.  But it seems to me
that we should support enabled=false without removing all the other
parameters so that users can disable the compression for testing or problem
resolution without losing an of the other parameter settings.  So to be
clear:

The following is valid:
hints_compression:
- class_name: foo
  parameters:
   - chunk_length_in_kb : 16 ;

But this is not:
hints_compression:
- class_name: foo
  parameters:
   - chunk_length_in_kb : 16 ;
  enabled : false ;

Currently when enabled is set false is constructs a default
CompressionParam object with the class name set to null.

Is there a reason to keep this or should we accept the parameters and
construct a CompressionParam with the parameters while continuing to set
the class name to null?

Claude

[DISCUSS] Standalone downgrader

2023-05-15 Thread Claude Warren, Jr via dev

I have an open pull request [1] to merge in a standalone downgrader.

The problem has been that between v3 and v4 there was a breaking change in
the system local table where the columns "broadcast_port", "listen_port",
and "rpc_port" were added.   The code (in the current pull request)
provides functionality to remove those columns when the older table is
written.  The code also allows for other transformations of the columns,
though none are implemented.

In order for the downgrade to work the following steps are taken (not all
are in this code, some are in a script I have for testing the process)


   1. Execute the standalone downgrade on the desired table(s).
   2. Delete the system_schema tables.
   3. Delete the *-Filter.db, *-Index.db, *-TOC.txt, *-Digest.*, and
   *-Summary.db for the modified table(s)
   4. Delete the original files (e.g. nb-*)
   5. Start the earlier version of the software.

I tested the current code by starting 4.1 to create the tables.  Downgraded
all the tables in the database to "ma" version, followed the above steps
and started 3.11   According to the logs 3.11.14 started.

The current pull request code is not as clean as I would like it, but it
does work correctly.

I would like some comments on the general approach for removing columns
where they are filtered out of the row definition during writing.

Your assistance is appreciated,
Claude

[1] https://github.com/apache/cassandra/pull/2045

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Claude Warren, Jr via dev

Since the talk was not accepted for Cassandra Summit, would it be possible
to record it as a simple youtube video and publish it so that the detailed
information about how to use Harry is not lost?

On Thu, May 25, 2023 at 7:36 AM Alex Petrov  wrote:

> While we are at it, we may also want to pull the in-jvm dtest API as a
> submodule, and actually move some tests that are common between the
> branches there.
>
> On Thu, May 25, 2023, at 6:03 AM, Caleb Rackliffe wrote:
>
> Isn’t the other reason Accord works well as a submodule that it has no
> dependencies on C* proper? Harry does at the moment, right? (Not that we
> couldn’t address that…just trying to think this through…)
>
> On May 24, 2023, at 6:54 PM, Benedict  wrote:
>
> 
>
> In this case Harry is a testing module - it’s not something we will
> develop in tandem with C* releases, and we will want improvements to be
> applied across all branches.
>
> So it seems a natural fit for submodules to me.
>
>
> On 24 May 2023, at 21:09, Caleb Rackliffe 
> wrote:
>
> 
> > Submodules do have their own overhead and edge cases, so I am mostly in
> favor of using for cases where the code must live outside of tree (such as
> jvm-dtest that lives out of tree as all branches need the same interfaces)
>
> Agreed. Basically where I've ended up on this topic.
>
> > We could go over some interesting examples such as testing 2i (SAI)
>
> +100
>
>
> On Wed, May 24, 2023 at 1:40 PM Alex Petrov  wrote:
>
>
> > I'm about to need to harry test for the paging across tombstone work for
> https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my
> own overlapping fuzzing came in). In the process, I'll see if I can't
> distill something really simple along the lines of how React approaches it (
> https://react.dev/learn).
>
> We can pick that up as an example, sure.
>
> On Wed, May 24, 2023, at 4:53 PM, Josh McKenzie wrote:
>
> I have submitted a proposal to Cassandra Summit for a 4-hour Harry
> workshop,
>
> I'm about to need to harry test for the paging across tombstone work for
> https://issues.apache.org/jira/browse/CASSANDRA-18424 (that's where my
> own overlapping fuzzing came in). In the process, I'll see if I can't
> distill something really simple along the lines of how React approaches it (
> https://react.dev/learn).
>
> Ideally we'd be able to get something together that's a high level "In the
> next 15 minutes, you will know and understand A-G and have access to N% of
> the power of harry" kind of offer.
>
> Honestly, there's a *lot* in our ecosystem where we could benefit from
> taking a page from their book in terms of onboarding and getting started
> IMO.
>
> On Wed, May 24, 2023, at 10:31 AM, Alex Petrov wrote:
>
> > I wonder if a mini-onboarding session would be good as a community
> session - go over Harry, how to run it, how to add a test?  Would that be
> the right venue?  I just would like to see how we can not only plug it in
> to regular CI but get everyone that wants to add a test be able to know how
> to get started with it.
>
> I have submitted a proposal to Cassandra Summit for a 4-hour Harry
> workshop, but unfortunately it got declined. Goes without saying, we can
> still do it online, time and resources permitting. But again, I do not
> think it should be barring us from making Harry a part of the codebase, as
> it already is. In fact, we can be iterating on the development quicker
> having it in-tree.
>
> We could go over some interesting examples such as testing 2i (SAI),
> modelling Group By tests, or testing repair. If there is enough appetite
> and collaboration in the community, I will see if we can pull something
> like that together. Input on _what_ you would like to see / hear / tested
> is also appreciated. Harry was developed out of a strong need for
> large-scale testing, which also has informed many of its APIs, but we can
> make it easier to access for interactive testing / unit tests. We have been
> doing a lot of that with Transactional Metadata, too.
>
> > I'll hold off on this until Alex Petrov chimes in. @Alex -> got any
> thoughts here?
>
> Yes, sorry for not responding on this thread earlier. I can not understate
> how excited I am about this, and how important I think this is. Time
> constraints are somehow hard to overcome, but I hope the results brought by
> TCM will make it all worth it.
>
> On Wed, May 24, 2023, at 4:23 PM, Alex Petrov wrote:
>
> I think pulling Harry into the tree will make adoption easier for the
> folks. I have been a bit swamped with Transactional Metadata work, but I
> wanted to make some of the things we were using for testing TCM available
> outside of TCM branch. This includes a bunch of helper methods to perform
> operations on the clusters, data generation, and more useful stuff. Of
> course, the question always remains about how much time I want to spend
> porting it all to Gossip, but I think we can find a reasonable compromise.
>
> I would not set this improvement as a prerequisi

Bloom filter calculation

2023-07-10 Thread Claude Warren, Jr via dev

Can someone explain to me how the Bloom filter table in
BloomFilterCalculations was derived and how it is supposed to work?  As I
read the table it seems to indicate that with 14 hashes and 20 bits you get
a fp of 6.71e-05.  But if you plug those numbers into the Bloom filter
calculator [1],  that is calculated only for 1 item being in the filter.
If you merge multiple filters together the false positive rate goes up.
And as [1] shows by 5 merges you are over 50% fp rate and by 10 you are at
close to 100% fp.  So I have to assume this analysis is wrong.  Can someone
point me to the correct calculations?

Claude

[1] https://hur.st/bloomfilter/?n=&p=6.71e-05&m=20&k=14

Re: Bloom filter calculation

2023-07-11 Thread Claude Warren, Jr via dev

I think we are talking past each other here.  What I was missing was the
size of the filter.  I was assuming that the size of the filter was the
number of bits specified in the BloomFilterCalculations (error on my
part),  what I was missing was the multiplication of the number of bits by
the number of keys.   Is there a fixed number of bits (it looks to be
Integer.MAX_VALUE - 20) or is it calculated somewhere?


On Tue, Jul 11, 2023 at 10:11 AM Benedict  wrote:

> I’m not sure I follow your reasoning. The bloom filter table is false
> positive per sstable given the number of bits *per key*. So for 10 keys you
> would have 200 bits, which yields the same false positive rate as 20 bits
> and 1 key.
>
> It does taper slightly at much larger N, but it’s pretty nominal for
> practical purposes.
>
> I don’t understand what you mean by merging multiple filters together. We
> do lookup multiple bloom filters per query, but only one per sstable, and
> the false positive rate you’re calculating for 10 such lookups would not be
> accurate. This would be 1-(1-0.671)^10 which is still only around a 4%,
> not 100%. You seem to be looking at the false positive rate of a bloom
> filter of 20 bits with 10 entries, which means only 2 bits per entry?
>
> On 11 Jul 2023, at 07:14, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
> 
> Can someone explain to me how the Bloom filter table in
> BloomFilterCalculations was derived and how it is supposed to work?  As I
> read the table it seems to indicate that with 14 hashes and 20 bits you get
> a fp of 6.71e-05.  But if you plug those numbers into the Bloom filter
> calculator [1],  that is calculated only for 1 item being in the filter.
> If you merge multiple filters together the false positive rate goes up.
> And as [1] shows by 5 merges you are over 50% fp rate and by 10 you are at
> close to 100% fp.  So I have to assume this analysis is wrong.  Can someone
> point me to the correct calculations?
>
> Claude
>
> [1] https://hur.st/bloomfilter/?n=&p=6.71e-05&m=20&k=14
>
>

[DISCUSS] Tiered Storage

2023-07-24 Thread Claude Warren, Jr via dev

I have been thinking about tiered storage wherein infrequently used data
can be moved off to slow (cold) storage (like S3).  I think that CEP-17 in
conjunction with CEP-21 provides an opportunity for an interesting approach.

As I understand it CEP-17 clarified the SSTables interface(s) so that
alternative implementations are possible, most notably CEM-25 (trie format
sstables).  CEP-21 provides a mechanism by which specific primary key
blocks can be assigned to specific servers.

It seems to me that we could implement an SSTable format that reads/writes
S3 storage and then use CEP-21 to direct specific keys to servers that
implement that storage.

I use primary key because I don't think we can reasonably partition the
records onto cold storage using any other method.

I think that records on the cold storage may be deleted, and may be updated
but both operations may take significant time and would require compaction
to be run at some point.  I expect that compaction would be very slow.

I am certain there are issues with this approach and am looking for
feedback before progressing an architecture proposal.

Thanks,
Claude

Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-03 Thread Claude Warren, Jr via dev

I think that we can get more developers interested if there are available
javadocs.  While many of the core classes are not going to be touched by
someone just starting, being able to understand what the external touch
points are and how they interact with other bits of the system can be
invaluable, particularly when you don't have the entire code base in front
of you.

For example, I just wrote a tool that explores the distribution of keys
across multiple sstables, I needed some of the tools classes but not much
more.  Javadocs would have made that easy if I did not have the source code
in front of me.

I am -1 on removing the javadocs.

On Thu, Aug 3, 2023 at 4:35 AM Josh McKenzie  wrote:

> If anything, the codebase could use a little more package/class/method
> markup in some places
>
> I am impressed with how diplomatic and generous you're being here Derek. :D
>
> On Wed, Aug 2, 2023, at 5:46 PM, Miklosovic, Stefan wrote:
>
> That is a good idea. I would like to have Javadocs valid when going
> through them in IDE. To enforce it, we would have to fix it first. If we
> find a way how to validate Javadocs without actually rendering them, that
> would be cool.
>
> There is a lot of legacy and rewriting of some custom-crafted formatting
> of some comments might be quite a tedious task to do if it is required to
> have them valid. I am in general for valid documentation and even enforcing
> it but what to do with what is already there ...
>
> 
> From: Jacek Lewandowski 
> Sent: Wednesday, August 2, 2023 23:38
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Shall we remove ant javadoc task?
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> With or without outputting JavaDoc to HTML, there are some errors which we
> should maybe fix. We want to keep the documentation, but there can be
> syntax errors which may prevent IDE generating a proper preview. So, the
> question is - should we validate the JavaDoc comments as a precommit task?
> Can it be done without actually generating HTML output?
>
> Thanks,
> Jacek
>
> śr., 2 sie 2023, 22:24 użytkownik Derek Chen-Becker  > napisał:
> Oh, whoops, I guess I'm the only one that thinks Javadoc is just the tool
> and/or it's output (not the markup itself) :P If anything, the codebase
> could use a little more package/class/method markup in some places, so I'm
> definitely only in favor of getting rid of the ant task. I should amend my
> statement to be "...I suspect most people are not opening their browsers
> and looking at Javadoc..." :)
>
> Cheers,
>
> Derek
>
>
>
> On Wed, Aug 2, 2023, 1:30 PM Josh McKenzie  jmcken...@apache.org>> wrote:
> most people are not looking at Javadoc when working on the codebase.
> I definitely use it extensively inside the IDE. But never as a compiled
> set of external docs.
>
> Which is to say, I'm +1 on removing the target and I'd ask everyone to
> keep javadoccing your classes and methods where things are non-obvious or
> there's a logical coupling with something else in the system. :)
>
> On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker wrote:
> +1. If a need comes up for Javadoc we can fix it at that point, but I
> suspect most people are not looking at Javadoc when working on the codebase.
>
> Cheers,
>
> Derek
>
> On Wed, Aug 2, 2023 at 11:11 AM Brandon Williams  dri...@gmail.com>> wrote:
> I don't think even if it works anyone is going to use the output, so
> I'm good with removal.
>
> Kind Regards,
> Brandon
>
> On Wed, Aug 2, 2023 at 11:50 AM Ekaterina Dimitrova
> mailto:e.dimitr...@gmail.com>> wrote:
> >
> > Hi everyone,
> > We were looking into a user report around our ant javadoc task recently.
> > That made us realize it is not run in CI; it finishes successfully even
> if there are hundreds of errors, some potentially breaking doc pages.
> >
> > There was a ticket discussion where a few community members mentioned
> that this task was probably unnecessary. Can we remove it, or shall we fix
> it?
> >
> > Best regards,
> > Ekaterina
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fdchenbecker&data=05%7C01%7CStefan.Miklosovic%40netapp.com%7C7ca04f0f58764996ab1e08db93a0de2a%7C4b0911a0929b4715944bc03745165b3a%7C0%7C0%7C638266091373361824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=n%2BrDfikzzoQG%2Fg%2BRvNqEEE6vHP8ZmY1skeosesLK9v0%3D&reserved=0>
> and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org<
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpgp.mit.edu%2Fpks%2Flookup%3Fsearch%3Dde

Fixes for UDF NPE during restart.

2023-08-15 Thread Claude Warren, Jr via dev

CASSANDRA-18739 describes a reproducible NPE on restart with some UDFs.
The solution outlined in that ticket was not used and a much simpler
solution provided by Stefan Miklosovic was implemented.  There are 2 pull
requests open for Cassandra 4.0 and 4.1 that have the fairly simple fix as
well as a test case to show that there was a problem and that the fix
works.

Can someone take a look at the issue and pull requests.  Stefan is working
on a solution for v5 that we may want to get in ASAP.

v4.0 - https://github.com/apache/cassandra/pull/2584

v4.1 - https://github.com/apache/cassandra/pull/2585

Thank you,
Claude

[DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-24 Thread Claude Warren, Jr via dev

I have just filed CEP-36 [1] to allow for keyspace/table storage outside of
the standard storage space.

There are two desires  driving this change:

   1. The ability to temporarily move some keyspaces/tables to storage
   outside the normal directory tree to other disk so that compaction can
   occur in situations where there is not enough disk space for compaction and
   the processing to the moved data can not be suspended.
   2. The ability to store infrequently used data on slower cheaper storage
   layers.

I have a working POC implementation [2] though there are some issues still
to be solved and much logging to be reduced.

I look forward to productive discussions,
Claude

[1]
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
[2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Claude Warren, Jr via dev

external storage can be any storage that you can produce a FileChannel
for.  There is an S3 library that does this so S3 is a definite
possibility for storage in this solution.  My example code only writes to a
different directory on the same system.  And there are a couple of places
where I did not catch the file creation, those have to be found and
redirected to the proxy location.  I think that it may be necessary to have
a java FileSystem object to make the whole thing work.  The S3 library that
I found also has an S3 FileSystem class.

This solution uses the internal file name for for example an sstable name.
The proxyfactory can examine the entire path and make a determination of
where to read/write the file.  So any determination that can be made based
on the information in the file path can be implemented with this approach.
There is no direct inspection of the data being written to determine
routing.  The only routing data are in the file name.

I ran an inhouse demo where I showed that we could reroute a single table
to a different storage while leaving the rest of the tables in the same
keyspace alone.

In discussing this with a colleague we hit upon the term "tiered nodes".
If you can spread your data across the nodes so that some nodes get the
infrequently used data (cold data) and other nodes receive the frequently
used data (hot data) then the cold data nodes can use this process to store
the data on S3 or similar systems.

On Mon, Sep 25, 2023 at 10:45 AM guo Maxwell  wrote:

> Great suggestion,  Can external storage only be local storage media? Or
> can it be stored in any storage medium, such as object storage s3 ?
> We have previously implemented a tiered storage capability, that is, there
> are multiple storage media on one node, SSD, HDD, and data placement based
> on requests. After briefly browsing the proposals, it seems that there are
> some differences. Can you help to do some explain ？ Thanks 。
>
>
> Claude Warren, Jr via dev  于2023年9月25日周一
> 14:49写道：
>
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
>> of the standard storage space.
>>
>> There are two desires  driving this change:
>>
>>1. The ability to temporarily move some keyspaces/tables to storage
>>outside the normal directory tree to other disk so that compaction can
>>occur in situations where there is not enough disk space for compaction 
>> and
>>the processing to the moved data can not be suspended.
>>2. The ability to store infrequently used data on slower cheaper
>>storage layers.
>>
>> I have a working POC implementation [2] though there are some issues
>> still to be solved and much logging to be reduced.
>>
>> I look forward to productive discussions,
>> Claude
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>
>>
>>
>
> --
> you are the apple of my eye !
>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Claude Warren, Jr via dev

My intention is to develop an S3 storage system using
https://github.com/carlspring/s3fs-nio

There are several issues yet to be solved:

   1. There are some internal calls that create files in the table
   directory that do not use the channel proxy.  I believe that these are
   making calls on File objects.  I think those File objects are Cassandra
   File objects not Java I/O File objects, but am unsure.
   2. Determine if the carlspring s3fs-nio library will be performant
   enough to work in the long run.  There may be issues with it:
  1. Downloading entire files before using them rather than using views
  into larger remotely stored files.
  2. Requiring a complete file to upload rather than using the partial
  upload capability of the S3 interface.



On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:

> "Rather than building this piece by piece, I think it'd be awesome if
> someone drew up an end-to-end plan to implement tiered storage, so we can
> make sure we're discussing the whole final state, and not an implementation
> detail of one part of the final state?"
>
> Do agree with jeff for this ~~~ If these feature can be supported in oss
> cassandra , I think it will be very popular, whether in  a private
> deployment environment or a public cloud service (our experience can prove
> it). In addition, it is also a cost-cutting option for users too
>
> Jeff Jirsa  于2023年9月26日周二 00:11写道：
>
>>
>> - I think this is a great step forward.
>> - Being able to move sstables around between tiers of storage is a
>> feature Cassandra desperately needs, especially if one of those tiers is
>> some sort of object storage
>> - This looks like it's a foundational piece that enables that. Perhaps by
>> a team that's already implemented this end to end?
>> - Rather than building this piece by piece, I think it'd be awesome if
>> someone drew up an end-to-end plan to implement tiered storage, so we can
>> make sure we're discussing the whole final state, and not an implementation
>> detail of one part of the final state?
>>
>>
>>
>>
>>
>>
>> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
>>> of the standard storage space.
>>>
>>> There are two desires  driving this change:
>>>
>>>1. The ability to temporarily move some keyspaces/tables to storage
>>>outside the normal directory tree to other disk so that compaction can
>>>occur in situations where there is not enough disk space for compaction 
>>> and
>>>the processing to the moved data can not be suspended.
>>>2. The ability to store infrequently used data on slower cheaper
>>>storage layers.
>>>
>>> I have a working POC implementation [2] though there are some issues
>>> still to be solved and much logging to be reduced.
>>>
>>> I look forward to productive discussions,
>>> Claude
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>>
>>>
>>>
>
> --
> you are the apple of my eye !
>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev

The intention of the CEP is to lay the groundwork to allow development of
ChannelProxyFactories that are pluggable in Cassandra.  In this way any
storage system can be a candidate for Cassandra storage provided
FileChannels can be created for the system.

As I stated before I think that there may be a need for a
java.nio.FileSystem implementation for  the proxies but I have not had the
time to dig into it yet.

Claude


On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:

> In my mind , it may be better to support most cloud storage : aws,
> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it
> seems there may need a filesystem interface layer for object storage. And
> should we support ,distributed system like hdfs ,or something else. We
> should first discuss what should be done and what should not be done. It
> simply only supports S3, which feels a bit customized for a certain user
> and is not universal enough.Am I right ?
>
> Claude Warren, Jr  于2023年9月26日周二 14:36写道：
>
>> My intention is to develop an S3 storage system using
>> https://github.com/carlspring/s3fs-nio
>>
>> There are several issues yet to be solved:
>>
>>1. There are some internal calls that create files in the table
>>directory that do not use the channel proxy.  I believe that these are
>>making calls on File objects.  I think those File objects are Cassandra
>>File objects not Java I/O File objects, but am unsure.
>>2. Determine if the carlspring s3fs-nio library will be performant
>>enough to work in the long run.  There may be issues with it:
>>   1. Downloading entire files before using them rather than using
>>   views into larger remotely stored files.
>>   2. Requiring a complete file to upload rather than using the
>>   partial upload capability of the S3 interface.
>>
>>
>>
>> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
>>
>>> "Rather than building this piece by piece, I think it'd be awesome if
>>> someone drew up an end-to-end plan to implement tiered storage, so we can
>>> make sure we're discussing the whole final state, and not an implementation
>>> detail of one part of the final state?"
>>>
>>> Do agree with jeff for this ~~~ If these feature can be supported in oss
>>> cassandra , I think it will be very popular, whether in  a private
>>> deployment environment or a public cloud service (our experience can prove
>>> it). In addition, it is also a cost-cutting option for users too
>>>
>>> Jeff Jirsa  于2023年9月26日周二 00:11写道：
>>>
>>>>
>>>> - I think this is a great step forward.
>>>> - Being able to move sstables around between tiers of storage is a
>>>> feature Cassandra desperately needs, especially if one of those tiers is
>>>> some sort of object storage
>>>> - This looks like it's a foundational piece that enables that. Perhaps
>>>> by a team that's already implemented this end to end?
>>>> - Rather than building this piece by piece, I think it'd be awesome if
>>>> someone drew up an end-to-end plan to implement tiered storage, so we can
>>>> make sure we're discussing the whole final state, and not an implementation
>>>> detail of one part of the final state?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>>> I have just filed CEP-36 [1] to allow for keyspace/table storage
>>>>> outside of the standard storage space.
>>>>>
>>>>> There are two desires  driving this change:
>>>>>
>>>>>1. The ability to temporarily move some keyspaces/tables to
>>>>>storage outside the normal directory tree to other disk so that 
>>>>> compaction
>>>>>can occur in situations where there is not enough disk space for 
>>>>> compaction
>>>>>and the processing to the moved data can not be suspended.
>>>>>2. The ability to store infrequently used data on slower cheaper
>>>>>storage layers.
>>>>>
>>>>> I have a working POC implementation [2] though there are some issues
>>>>> still to be solved and much logging to be reduced.
>>>>>
>>>>> I look forward to productive discussions,
>>>>> Claude
>>>>>
>>>>> [1]
>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>>>>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> you are the apple of my eye !
>>>
>>
>
> --
> you are the apple of my eye !
>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev

I spent a little (very little) time building an S3 implementation using an
Apache licensed S3 filesystem package.  I have not yet tested it but if
anyone is interested it is at
https://github.com/Aiven-Labs/S3-Cassandra-ChannelProxy

In looking at some of the code I think the Cassandra File class needs to be
modified to ask the ChannelProxy for the default file system for the file
in question.  This should resolve some of the issues my original demo has
with some files being created in the data tree.  It may also handle many of
the cases for offline tools as well.


On Tue, Sep 26, 2023 at 7:33 PM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Would it be possible to make Jimfs integration production-ready then? I
> see we are using it in the tests already.
>
> It might be one of the reference implementations of this CEP. If there is
> a type of workload / type of nodes with plenty of RAM but no disk, some
> kind of compute nodes, it would just hold it all in memory and we might
> "flush" it to a cloud-based storage if rendered to be not necessary anymore
> (whatever that means).
>
> We could then completely bypass the memtables as fetching data from an
> SSTable from memory would be basically roughly same?
>
> On the other hand, that might be achieved by creating a ramdisk so I am
> not sure what exactly we would gain here. However, if it was eventually
> storing these SSTables in a cloud storage, we might "compact" "TWCS tables"
> automatically after so-and-so period by moving them there.
>
> 
> From: Jake Luciani 
> Sent: Tuesday, September 26, 2023 19:03
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias
> external storage locations
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-28 Thread Claude Warren, Jr via dev

Sorry I was out sick and did not respond yesterday.

Henrik,  How does your system work?  What is the design strategy?  Also is
your code available somewhere?

After looking at the code some more I think that the best solution is not a
FileChannelProxy but to modify the Cassandra File class to get a FileSystem
object for a Factory to build the Path that is used within that object.  I
think that this makes if very small change that will pick up 90+% of the
cases.  We then just need to find the edge cases.





On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Super excited about this as well. Happy to help test with Azure and any
> other way needed.
>
> Thanks,
> German
> --
> *From:* guo Maxwell 
> *Sent:* Wednesday, September 27, 2023 7:38 PM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy
> to alias external storage locations
>
> Thanks , So I think a jira can be created now. And I'd be happy to provide
> some help with this as well if needed.
>
> Henrik Ingo  于2023年9月28日周四 00:21写道：
>
> It seems I was volunteered to rebase the Astra implementation of this
> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
> of course) I'll try to get going today or tomorrow, so that this
> discussion can then benefit from having that code available for inspection.
> And potentially using it as a soluttion to this use case.
>
> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:
>
> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>
>
>
> --
>
> Henrik Ingo
>
> c. +358 40 569 7354
>
> w. www.datastax.com
>
> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>
> <https://www.linkedin.com/company/datastax/>
> <https://github.com/datastax/>
>
>
>
> --
> you are the apple of my eye !
>

multiple ParameterizedClass objects?

2023-10-03 Thread Claude Warren, Jr via dev

I have a case where I would like to be able to specify a
collection of ParameterizedClass objects in the configuration file.  Is
there a standard way to do this?  If not, does anyone have a suggestion for
a clean way to implement it?

Claude

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-10 Thread Claude Warren, Jr via dev

I have been exploring adding a second Path to the Cassandra File object.
The original path being the path within the standard Cassandra directory
tree and the second being a translated path when there is what was called a
ChannelProxy in place.

A problem arises when the Directories.getLocationForDisk() is called.  It
seems to be looking for locations that start with the data directory
absolute path.   I can change it to make it look for the original path not
the translated path.  But in other cases the translated path is the one
that is needed.

I notice that there is a concept of multiple file locations in the code
base, particularly in the Directories.DataDirectories class where there are
"locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
constructor, and in the
DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
which returns an array of String and is populated from the cassandra.yaml
file.

The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  only
ever seems to return an array of one item.

Why does DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
return an array?

Should the system set the path to the root of the ColumnFamilyStore in the
ColumnFamilyStore directories instance?
Should the Directories.getLocationForDisk() do the proxy to the other file
system?

Where is the proper location to change from the standard internal
representation to the remote location?

On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr 
wrote:

> Sorry I was out sick and did not respond yesterday.
>
> Henrik,  How does your system work?  What is the design strategy?  Also is
> your code available somewhere?
>
> After looking at the code some more I think that the best solution is not
> a FileChannelProxy but to modify the Cassandra File class to get a
> FileSystem object for a Factory to build the Path that is used within that
> object.  I think that this makes if very small change that will pick up
> 90+% of the cases.  We then just need to find the edge cases.
>
>
>
>
>
> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
> dev@cassandra.apache.org> wrote:
>
>> Super excited about this as well. Happy to help test with Azure and any
>> other way needed.
>>
>> Thanks,
>> German
>> --
>> *From:* guo Maxwell 
>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>> *To:* dev@cassandra.apache.org 
>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy
>> to alias external storage locations
>>
>> Thanks , So I think a jira can be created now. And I'd be happy to
>> provide some help with this as well if needed.
>>
>> Henrik Ingo  于2023年9月28日周四 00:21写道：
>>
>> It seems I was volunteered to rebase the Astra implementation of this
>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
>> of course) I'll try to get going today or tomorrow, so that this
>> discussion can then benefit from having that code available for inspection.
>> And potentially using it as a soluttion to this use case.
>>
>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:
>>
>> We (DataStax) have a FileSystemProvider for Astra we can provide.
>> Works with S3/GCS/Azure.
>>
>> I'll ask someone on our end to make it accessible.
>>
>> This would work by having a bucket prefix per node. But there are lots
>> of details needed to support things like out of bound compaction
>> (mentioned in CEP).
>>
>> Jake
>>
>> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>> >
>> > I agree with Ariel, the more suitable insertion point is probably the
>> JDK level FileSystemProvider and FileSystem abstraction.
>> >
>> > It might also be that we can reuse existing work here in some cases?
>> >
>> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>> >
>> > 
>> > Hi,
>> >
>> > Support for multiple storage backends including remote storage backends
>> is a pretty high value piece of functionality. I am happy to see there is
>> interest in that.
>> >
>> > I think that `ChannelProxyFactory` as an integration point is going to
>> quickly turn into a dead end as we get into really using multiple storage
>> backends. We need to be able to list files and really the full range of
>> filesystem interactions that Java supports should work with any backend to
>> make development, testing, and using existing code straightforward.
>> >
>> > It's a little more work to get C* to creates paths for alternate
>> backends where appropriate, but t

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread Claude Warren, Jr via dev

After a bit more analysis and some testing I have a new branch that I think
solves the problem. [1]  I have also created a pull request internal to my
clone so that it is easy to see the changes. [2]

The strategy change is to move the insertion of the proxy from the
Cassandra File class to the Directories class.  This means that all action
with the table is captured (this solves a problem encountered in the
earlier strategy).
The strategy is to create a path on a different FileSystem and return
that.  The example code only moves the data for the table to another
directory on the same FileSystem but using a different FileSystem
implementation should be a trivial change.

The current code works on an entire keyspace.  I, while code exists to
limit the redirect to a table I have not tested that branch yet and am not
certain that it will work.  There is also some code (i.e. the PathParser)
that may no longer be needed but has not been removed yet.

Please take a look and let me know if you see any issues with this solution.

Claude

[1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
[2] https://github.com/Claudenw/cassandra/pull/5/files



On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr 
wrote:

> I have been exploring adding a second Path to the Cassandra File object.
> The original path being the path within the standard Cassandra directory
> tree and the second being a translated path when there is what was called a
> ChannelProxy in place.
>
> A problem arises when the Directories.getLocationForDisk() is called.  It
> seems to be looking for locations that start with the data directory
> absolute path.   I can change it to make it look for the original path not
> the translated path.  But in other cases the translated path is the one
> that is needed.
>
> I notice that there is a concept of multiple file locations in the code
> base, particularly in the Directories.DataDirectories class where there are
> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
> constructor, and in the
> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
> which returns an array of String and is populated from the cassandra.yaml
> file.
>
> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  only
> ever seems to return an array of one item.
>
> Why does DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
> return an array?
>
> Should the system set the path to the root of the ColumnFamilyStore in the
> ColumnFamilyStore directories instance?
> Should the Directories.getLocationForDisk() do the proxy to the other file
> system?
>
> Where is the proper location to change from the standard internal
> representation to the remote location?
>
>
> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr 
> wrote:
>
>> Sorry I was out sick and did not respond yesterday.
>>
>> Henrik,  How does your system work?  What is the design strategy?  Also
>> is your code available somewhere?
>>
>> After looking at the code some more I think that the best solution is not
>> a FileChannelProxy but to modify the Cassandra File class to get a
>> FileSystem object for a Factory to build the Path that is used within that
>> object.  I think that this makes if very small change that will pick up
>> 90+% of the cases.  We then just need to find the edge cases.
>>
>>
>>
>>
>>
>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> Super excited about this as well. Happy to help test with Azure and any
>>> other way needed.
>>>
>>> Thanks,
>>> German
>>> --
>>> *From:* guo Maxwell 
>>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>>> *To:* dev@cassandra.apache.org 
>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy
>>> to alias external storage locations
>>>
>>> Thanks , So I think a jira can be created now. And I'd be happy to
>>> provide some help with this as well if needed.
>>>
>>> Henrik Ingo  于2023年9月28日周四 00:21写道：
>>>
>>> It seems I was volunteered to rebase the Astra implementation of this
>>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
>>> of course) I'll try to get going today or tomorrow, so that this
>>> discussion can then benefit from having that code available for inspection.
>>> And potentially using it as a soluttion to this use case.
>>>
>>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani  wrote:
>>>
>>> We (DataStax) have a FileSystemProvid

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread Claude Warren, Jr via dev

Henrik and Guo,

Have you moved forward on this topic?  I have not seen anything recently.
I have posted a solution that intercepts calls for directories and injects
directories from different FileSystems.  This means that a node can have
keyspaces both on the local file system and one or more other FileSystem
implementations.

I look forward to hearing from you,
Claude


On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr 
wrote:

> After a bit more analysis and some testing I have a new branch that I
> think solves the problem. [1]  I have also created a pull request internal
> to my clone so that it is easy to see the changes. [2]
>
> The strategy change is to move the insertion of the proxy from the
> Cassandra File class to the Directories class.  This means that all action
> with the table is captured (this solves a problem encountered in the
> earlier strategy).
> The strategy is to create a path on a different FileSystem and return
> that.  The example code only moves the data for the table to another
> directory on the same FileSystem but using a different FileSystem
> implementation should be a trivial change.
>
> The current code works on an entire keyspace.  I, while code exists to
> limit the redirect to a table I have not tested that branch yet and am not
> certain that it will work.  There is also some code (i.e. the PathParser)
> that may no longer be needed but has not been removed yet.
>
> Please take a look and let me know if you see any issues with this
> solution.
>
> Claude
>
> [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
> [2] https://github.com/Claudenw/cassandra/pull/5/files
>
>
>
> On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr 
> wrote:
>
>> I have been exploring adding a second Path to the Cassandra File object.
>> The original path being the path within the standard Cassandra directory
>> tree and the second being a translated path when there is what was called a
>> ChannelProxy in place.
>>
>> A problem arises when the Directories.getLocationForDisk() is called.  It
>> seems to be looking for locations that start with the data directory
>> absolute path.   I can change it to make it look for the original path not
>> the translated path.  But in other cases the translated path is the one
>> that is needed.
>>
>> I notice that there is a concept of multiple file locations in the code
>> base, particularly in the Directories.DataDirectories class where there are
>> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
>> constructor, and in the
>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
>> which returns an array of String and is populated from the cassandra.yaml
>> file.
>>
>> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
>> only ever seems to return an array of one item.
>>
>> Why does
>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  return an
>> array?
>>
>> Should the system set the path to the root of the ColumnFamilyStore in
>> the ColumnFamilyStore directories instance?
>> Should the Directories.getLocationForDisk() do the proxy to the other
>> file system?
>>
>> Where is the proper location to change from the standard internal
>> representation to the remote location?
>>
>>
>> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr 
>> wrote:
>>
>>> Sorry I was out sick and did not respond yesterday.
>>>
>>> Henrik,  How does your system work?  What is the design strategy?  Also
>>> is your code available somewhere?
>>>
>>> After looking at the code some more I think that the best solution is
>>> not a FileChannelProxy but to modify the Cassandra File class to get a
>>> FileSystem object for a Factory to build the Path that is used within that
>>> object.  I think that this makes if very small change that will pick up
>>> 90+% of the cases.  We then just need to find the edge cases.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> Super excited about this as well. Happy to help test with Azure and any
>>>> other way needed.
>>>>
>>>> Thanks,
>>>> German
>>>> --
>>>> *From:* guo Maxwell 
>>>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>>>> *To:* dev@cassandra.apache.org 
>>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable
>&g

Re: [DISCUSS] CommitLog default disk access mode

2023-10-18 Thread Claude Warren, Jr via dev

I think introducing the  feature is a good idea.
I also think that it should _NOT_ be enabled by default for all the reasons
stated above.
Finding a cohort of users who are interested in turning it on would provide
a nice testbed to shake out any issues without affecting everyone.

On Tue, Oct 17, 2023 at 3:58 PM C. Scott Andreas 
wrote:

> Let’s please not change the default at the same time the feature is
> introduced.
>
> Making the capability available will allow users to evaluate and quantify
> the benefit of the work, as well as to call out any unintended
> consequences. As users and the project gain confidence in the results, we
> can evaluate changing the default.
>
> – Scott
>
> On Oct 17, 2023, at 4:25 AM, guo Maxwell  wrote:
>
> 
> -1
>
> I still think we should keep it as it is until the  direct io
> for commitlog (read and write) is ready and relatively stable. And then we
> may change the default value to direct io from mmap in a future version,
> such as 5.2, or 6.0.
>
> Pawar, Amit  于2023年10月17日周二 19:03写道：
>
>> [AMD Official Use Only - General]
>>
>> Thank you all for your input. Received total 6 replies and below is the
>> summary.
>>
>>
>>
>> 1. Mmap   : 2/6
>>
>> 2. Direct-I/O : 4/6
>>
>>
>>
>> Default should be changed to Direct-IO then ? please confirm.
>>
>>
>>
>> Thanks,
>>
>> Amit
>>
>>
>>
>> Strongly agree with this point of view that direct IO  can bring great
>> benefits.
>>
>>
>>
>> I have reviewed part of the code, and my preliminary judgment is that it
>> is not very common and limited in some situations, for example, it  works
>> for commitlog's write path only for this patch.So I suggest that the
>> default value should not be modified until the entire function is
>> comprehensive and stable, and then modified in a future version.
>>
>>
>>
>> Sam  于2023年10月17日周二 05:39写道：
>>
>> *Glad you brought up compaction here - I think there would be a
>> significant benefit to moving compaction to direct i/o.*
>>
>>
>>
>> +1. Would love to see this get traction.
>>
>>
>>
>> On Mon, 16 Oct 2023 at 19:38, Jon Haddad 
>> wrote:
>>
>> Glad you brought up compaction here - I think there would be a
>> significant benefit to moving compaction to direct i/o.
>>
>>
>> On 2023/10/16 16:14:28 Benedict wrote:
>> > I have some plans to (eventually) use the commit log as memtable
>> payload storage (ie memtables would reference the commit log entries
>> directly, storing only indexing info), and to back first level of sstables
>> by reference to commit log entries. This will permit us to deliver not only
>> much bigger memtables (cutting compaction throughput requirements by the
>> ratio of size increase - so pretty dramatically), and faster flushing (so
>> better behaviour ling write bursts), but also a fairly cheap and simple way
>> to support MVCC - which will be helpful for transaction throughput.
>> >
>> > There is also a new commit log (“journal”) coming with Accord, that the
>> rest of C* may or may not transition to.
>> >
>> > I only say this because this makes the utility of direct IO for commit
>> log suspect, as we will be reading from the files as a matter of course
>> should this go ahead; and we may end up relying on a different commit log
>> implementation before long anyway.
>> >
>> > This is obviously a big suggestion and is not guaranteed to transpire,
>> and probably won’t within the next year or so, but it should perhaps form
>> some minimal part of any calculus. If the patch is otherwise simple and
>> beneficial I don’t have anything against it, and the use of direct IO could
>> well be of benefit eg in compaction - and also in future if we manage to
>> bring a page management in process. So laying foundations there could be of
>> benefit, even if the commit log eventually does not use it.
>> >
>> > > On 16 Oct 2023, at 17:00, Jon Haddad 
>> wrote:
>> > >
>> > > I haven't looked at the patch, but at a high level, defaulting to
>> direct I/O for commit logs makes a lot of sense to me.
>> > >
>> > >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
>> > >> [Public]
>> > >>
>> > >> Hi,
>> > >>
>> > >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO
>> feature is proposed through new PR[1] to improve the CommitLog IO speed.
>> Enabling this by default could be useful feature to address IO bottleneck
>> seen during peak load.
>> > >>
>> > >> Need your input regarding changing this default. Please suggest.
>> > >>
>> > >> https://issues.apache.org/jira/browse/CASSANDRA-18464
>> > >>
>> > >> thanks,
>> > >> Amit Pawar
>> > >>
>> > >> [1] - https://github.com/apache/cassandra/pull/2777
>> > >>
>> >
>>
>>
>>
>>
>> --
>>
>> you are the apple of my eye !
>>
>
>
> --
> you are the apple of my eye !
>
>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-19 Thread Claude Warren, Jr via dev

How does Henrick's implementation work?

My implementation is simple and in testing causes no side effects.  The
code simply intercepts the calls to
`Directories.DataDirectories.getDataDirectoriesFor(TableMetadata)` to
inject a DataDirectory containing the path to a different FileSystem when
appropriate.  The only other change is a similar change to
`Directories.DataDirectories.getAllDirectories()` that inserts the data
directories for the different FileSystem.

>From notes earlier in this stack, it seems that Henrik's implementation is
a FileSystemProvider.  If that is the case then an implementation of
"FileSystemMapperHandler" would wrap the provider to expose the
functionality necessary for the two intercepts I mentioned above.

Do I understand this correctly?  Am I missing something?  Is there a reason
this solution will not work?

Claude

On Wed, Oct 18, 2023 at 10:06 AM guo Maxwell  wrote:

>
> If it is ok for Henrik to rebase the Astra implementation of this
> functionality (FileSystemProvider) onto Cassandra trunk.
>
> Then we can create a jira to move this forward for a small step.
>
> Claude Warren, Jr  于2023年10月18日周三 15:05写道：
>
>> Henrik and Guo,
>>
>> Have you moved forward on this topic?  I have not seen anything
>> recently.  I have posted a solution that intercepts calls for directories
>> and injects directories from different FileSystems.  This means that a node
>> can have keyspaces both on the local file system and one or more other
>> FileSystem implementations.
>>
>> I look forward to hearing from you,
>> Claude
>>
>>
>> On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr 
>> wrote:
>>
>>> After a bit more analysis and some testing I have a new branch that I
>>> think solves the problem. [1]  I have also created a pull request internal
>>> to my clone so that it is easy to see the changes. [2]
>>>
>>> The strategy change is to move the insertion of the proxy from the
>>> Cassandra File class to the Directories class.  This means that all action
>>> with the table is captured (this solves a problem encountered in the
>>> earlier strategy).
>>> The strategy is to create a path on a different FileSystem and return
>>> that.  The example code only moves the data for the table to another
>>> directory on the same FileSystem but using a different FileSystem
>>> implementation should be a trivial change.
>>>
>>> The current code works on an entire keyspace.  I, while code exists to
>>> limit the redirect to a table I have not tested that branch yet and am not
>>> certain that it will work.  There is also some code (i.e. the PathParser)
>>> that may no longer be needed but has not been removed yet.
>>>
>>> Please take a look and let me know if you see any issues with this
>>> solution.
>>>
>>> Claude
>>>
>>> [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
>>> [2] https://github.com/Claudenw/cassandra/pull/5/files
>>>
>>>
>>>
>>> On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr <
>>> claude.war...@aiven.io> wrote:
>>>
>>>> I have been exploring adding a second Path to the Cassandra File
>>>> object.  The original path being the path within the standard Cassandra
>>>> directory tree and the second being a translated path when there is what
>>>> was called a ChannelProxy in place.
>>>>
>>>> A problem arises when the Directories.getLocationForDisk() is called.
>>>> It seems to be looking for locations that start with the data directory
>>>> absolute path.   I can change it to make it look for the original path not
>>>> the translated path.  But in other cases the translated path is the one
>>>> that is needed.
>>>>
>>>> I notice that there is a concept of multiple file locations in the code
>>>> base, particularly in the Directories.DataDirectories class where there are
>>>> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
>>>> constructor, and in the
>>>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
>>>> which returns an array of String and is populated from the cassandra.yaml
>>>> file.
>>>>
>>>> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
>>>> only ever seems to return an array of one item.
>>>>
>>>> Why does
>>>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  retur

CASSANDRA-18775 (Cassandra supported OSs)

2023-10-20 Thread Claude Warren, Jr via dev

I am looking at https://issues.apache.org/jira/browse/CASSANDRA-18775 and
want to ensure that I do not remove too many libraries.

I think that preserving any sigar library where the file name contains the
word "linux" or "macosx" should be acceptable.  This will preserve:
libsigar-amd64-linux.so
libsigar-ia64-linux.so
libsigar-ppc-linux.so
libsigar-ppc64-aix-5.so
libsigar-ppc64-linux.so
libsigar-ppc64le-linux.so
libsigar-s390x-linux.so
libsigar-universal-macosx.dylib
libsigar-universal64-macosx.dylib
libsigar-x86-linux.so

and remove:

libsigar-amd64-freebsd-6.so
libsigar-amd64-solaris.so
libsigar-ia64-hpux-11.sl
libsigar-pa-hpux-11.sl
libsigar-ppc-aix-5.so
libsigar-sparc-solaris.so
libsigar-sparc64-solaris.so
libsigar-x86-freebsd-5.so
libsigar-x86-freebsd-6.so
libsigar-x86-solaris.so

resulting in a savings of 3,105,384 bytes out of 6,450,526 from the
/lib/sigar-bin directory, a 48% reduction.

Does anyone see any reason _not_ to do this?

CASSANDRA-16565

2023-10-24 Thread Claude Warren, Jr via dev

I am working on https://issues.apache.org/jira/browse/CASSANDRA-16565 and
have a small testing program that executes the sigar and equivalent OSHI
methods to verify that they are the same.  I would like to have this run on
various platforms.

I have tgz with all the libraries and code as well as a run.sh bash script
to compile and execute it.

Is there someplace I can stash the tgz that others can download it from?
The file info is : 6269066 okt 24 12:46 compare_oshi_sigar.tgz

OSHI does not implement all the methods that Sigar does and there is a
difference in the bitness (32/64 bit) of the results.

*Maximum Virtual Memory*

Sigar dates  from the time of 32 bit OS and so when checking for things
like maximum virtual memory it returns -1 (INFINITE) for any value over
0x7fff.

OSHI on the other hand is 64 bit aware and will return long values for the
maximum virtual memory.  Looking at tools like "ulimit" it converts
anything over 0x7fff to the text "infinite" in the return.

To handle this I set the expected Virtual memory to be 0x7FFFl and
accept any value >= that or  -1 as  the acceptable value.

*Maximum Processes*

This appears to be the value of "ulimit -u" which is not supported in
OSHI.  However, on Linux (and I think Mac OS) we can make a call to the
bash interpreter to return "uname -u".  On other systems I log that we
don't have a way to get this value and return the standard default max
processes of 1024.  This will cause the "warnIfRunningInDegradedMode" to
warn about possible degradation.

Claude

Re: CASSANDRA-16565

2023-10-25 Thread Claude Warren, Jr via dev

I ended up posting the code at
https://github.com/Aiven-Labs/compare_oshi_sigar if anyone wants to take a
look and see if they get differing results on various systems.

On Tue, Oct 24, 2023 at 4:59 PM Brandon Williams  wrote:

> On Tue, Oct 24, 2023 at 7:48 AM Claude Warren, Jr via dev
>  wrote:
> >
> > Is there someplace I can stash the tgz that others can download it from?
> The file info is : 6269066 okt 24 12:46 compare_oshi_sigar.tgz
>
> Is posting a github branch not sufficient?
>

Development Dependencies documentation.

2023-10-25 Thread Claude Warren, Jr via dev

I just had to change dependencies in Cassandra for the first  time and I
think the documentation [1] is out of date.

First I think most of the file edits are in the ".build" directory.  Adding
jars to the "lib" directory works until calling "ant realclean", so perhaps
the instructions should include regenerating the "lib" folder after making
the edits.

If I am wrong please let me know, otherwise I will open a ticket and update
the documentation.

[1] https://cassandra.apache.org/_/development/dependencies.html

Re: CASSANDRA-18775 (Cassandra supported OSs)

2023-10-25 Thread Claude Warren, Jr via dev

I closed 18775 as it did not seem reasonable after discussions here.  I
have been working on 16565 and have a pull request [1] and an experimental
suite to show the differences. [2]

[1]  https://github.com/apache/cassandra/pull/2842
[2]  https://github.com/Aiven-Labs/compare_oshi_sigar

On Wed, Oct 25, 2023 at 2:59 PM Josh McKenzie  wrote:

> +1 to drop if we're not using.
>
> On Fri, Oct 20, 2023, at 6:58 PM, Ekaterina Dimitrova wrote:
>
> +1 on removal the whole lib if we are sure we don’t need it. Nothing
> better than some healthy house cleaning
>
>  -1 on partial removals
>
> On Fri, 20 Oct 2023 at 17:34, David Capwell  wrote:
>
> +1 to drop the whole lib…
>
>
> On Oct 20, 2023, at 7:55 AM, Jeremiah Jordan 
> wrote:
>
> Agreed.  -1 on selectively removing any of the libs.  But +1 for removing
> the whole thing if it is no longer used.
>
> -Jeremiah
>
> On Oct 20, 2023 at 9:28:55 AM, Mick Semb Wever  wrote:
>
> Does anyone see any reason _not_ to do this?
>
>
>
> Thanks for bring this to dev@
>
> I see reason not to do it, folk do submit patches for other archs despite
> us not formally maintaining and testing the code for those archs.  Some
> examples are PPC64 Big Endian (CASSANDRA-7476), s390x (CASSANDRA-17723),
> PPC64 Little Endian (CASSANDRA-7381), sparcv9 (CASSANDRA-6628).  Wrote this
> on the ticket too.
>
> +1 for removing sigar altogether (as Brandon points out).
>
>
>

Immediately Deprecated Code

2023-10-31 Thread Claude Warren, Jr via dev

I was thinking about code that is used to migrate from one version to
another.  For example the code that rewrote the order of the hash values
used for Bloom filters.  That code was necessary for the version it was
coded in.  But the next version does not need that code because the next
version is not going to read the data from 2 versions prior to itself.  So
the code could be removed for verson+1.

So, would it have made sense to annotate those methods (and variables) as
deprecated since the version they were written in so the methods/variables
can be removed in the next version?

If so, what I propose is that all transitional methods and variable be
marked as deprecated with the version in which they were introduced.

Re: Immediately Deprecated Code

2023-10-31 Thread Claude Warren, Jr via dev

Good point.  When I was thinking about this originally I did realize that
the deprecated tag would need a since = v+1 but I neglected to note that in
my original post.
So in your example the code would be maked as deprecated since v5.0 even
though the code base it is being written in is 4.0.  Thus the code would
not be a candidate for removal until 6.0
I think that this make it easier to remember all those bits that can be
deleted later.

On Tue, Oct 31, 2023 at 11:57 AM Miklosovic, Stefan via dev <
dev@cassandra.apache.org> wrote:

> Do I understand it correctly that this is basically the case of
> "deprecated on introduction" as we know that it will not be necessary the
> very next version?
>
> I think that not everybody is upgrading from version to version as they
> appear. If somebody upgrades from 4.0 to 5.1 (which we seem to support) (1)
> and you would have introduced the deprecation in 4.0 with intention to
> remove it in 5.0 and somebody jumps to 5.1 straight away, is not that a
> problem? Because they have not made the move via 5.0 where you upgrade
> logic was triggered.
>
> (1)
> https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTestBase.java#L97-L108
>
> ____
> From: Claude Warren, Jr via dev 
> Sent: Tuesday, October 31, 2023 10:57
> To: dev
> Cc: Claude Warren, Jr
> Subject: Immediately Deprecated Code
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> I was thinking about code that is used to migrate from one version to
> another.  For example the code that rewrote the order of the hash values
> used for Bloom filters.  That code was necessary for the version it was
> coded in.  But the next version does not need that code because the next
> version is not going to read the data from 2 versions prior to itself.  So
> the code could be removed for verson+1.
>
> So, would it have made sense to annotate those methods (and variables) as
> deprecated since the version they were written in so the methods/variables
> can be removed in the next version?
>
> If so, what I propose is that all transitional methods and variable be
> marked as deprecated with the version in which they were introduced.
>
>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-31 Thread Claude Warren, Jr via dev

@henrik,  Have you made any progress on this?  I would like to help drive
it forward but I am waiting to see what your code looks like and figure out
what I need to do.  Any update on timeline would be appreciated.

On Mon, Oct 23, 2023 at 9:07 PM Jon Haddad 
wrote:

> I think this is a great more generally useful than the two scenarios
> you've outlined.  I think it could / should be possible to use an object
> store as the primary storage for sstables and rely on local disk as a cache
> for reads.
>
> I don't know the roadmap for TCM, but imo if it allowed for more stable,
> pre-allocated ranges that compaction will always be aware of (plus a bunch
> of plumbing I'm deliberately avoiding the details on), then you could
> bootstrap a new node by copying s3 directories around rather than streaming
> data between nodes.  That's how we get to 20TB / node, easy scale up /
> down, etc, and always-ZCS for non-object store deployments.
>
> Jon
>
> On 2023/09/25 06:48:06 "Claude Warren, Jr via dev" wrote:
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of
> > the standard storage space.
> >
> > There are two desires  driving this change:
> >
> >1. The ability to temporarily move some keyspaces/tables to storage
> >outside the normal directory tree to other disk so that compaction can
> >occur in situations where there is not enough disk space for
> compaction and
> >the processing to the moved data can not be suspended.
> >2. The ability to store infrequently used data on slower cheaper
> storage
> >layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still
> > to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
>

Re: Immediately Deprecated Code

2023-10-31 Thread Claude Warren, Jr via dev

In your example 5.1 can read 4.x because 4.0 (?) is the earliest version
that 5.x supports.  I don't think you can upgrade directly from 3.x to 5.x
without an intermediate stop at some version of 4.x can you?  So when we
get to 6.x we won't need the 4 -> 5 conversion code because 6 will only
support reading from  5.  If I am incorrect and we expect a version to be
able to read 2 major versions back then indeed the deprecated since would
be 2 major versions ahead of the version when the code was written.

On Tue, Oct 31, 2023 at 2:40 PM Andrew Weaver 
wrote:

> Skipping versions on upgrade is absolutely something that happens in the
> real world.  This is particularly highlighted by the discussion around
> 5.0/5.1 that's been happening - 5.0 has been described as a potential
> "ghost version" which I completely understand.
>
> Getting rid of some of the old cruft that seems unnecessary (and strictly
> speaking is unnecessary) is not without its downsides.  In this case, that
> cruft improves the user experience.
>
> On Tue, Oct 31, 2023 at 5:56 AM Miklosovic, Stefan via dev <
> dev@cassandra.apache.org> wrote:
>
>> Do I understand it correctly that this is basically the case of
>> "deprecated on introduction" as we know that it will not be necessary the
>> very next version?
>>
>> I think that not everybody is upgrading from version to version as they
>> appear. If somebody upgrades from 4.0 to 5.1 (which we seem to support) (1)
>> and you would have introduced the deprecation in 4.0 with intention to
>> remove it in 5.0 and somebody jumps to 5.1 straight away, is not that a
>> problem? Because they have not made the move via 5.0 where you upgrade
>> logic was triggered.
>>
>> (1)
>> https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTestBase.java#L97-L108
>>
>> 
>> From: Claude Warren, Jr via dev 
>> Sent: Tuesday, October 31, 2023 10:57
>> To: dev
>> Cc: Claude Warren, Jr
>> Subject: Immediately Deprecated Code
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>> I was thinking about code that is used to migrate from one version to
>> another.  For example the code that rewrote the order of the hash values
>> used for Bloom filters.  That code was necessary for the version it was
>> coded in.  But the next version does not need that code because the next
>> version is not going to read the data from 2 versions prior to itself.  So
>> the code could be removed for verson+1.
>>
>> So, would it have made sense to annotate those methods (and variables) as
>> deprecated since the version they were written in so the methods/variables
>> can be removed in the next version?
>>
>> If so, what I propose is that all transitional methods and variable be
>> marked as deprecated with the version in which they were introduced.
>>
>>
>
> --
> Andrew Weaver
>

Re: Immediately Deprecated Code

2023-11-01 Thread Claude Warren, Jr via dev

My thought was that we have code that is intended to be used for a specific
time frame.  We should clean up the code base when that code is no longer
used.  But we don't have any good way to track that.  This proposal was an
attempt to provide signposts for removing such code.

On Tue, Oct 31, 2023 at 6:56 PM Mick Semb Wever  wrote:

>
> For online upgrades we support skipping majors so long as the majors are
> adjacent.
> That is, any 4.x.z to any 5.x.z
>  ( Is it recommended that you always first patch upgrade the .z to the
> latest before the major upgrade. )
>
> For offline upgrades, we are aiming to maintain all compatibility.
>
> Take care when removing code, there are various (serdes) classes that look
> like they are for other components but are also used in the storage engine.
>
>
>
> On Tue, 31 Oct 2023 at 18:42, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> In your example 5.1 can read 4.x because 4.0 (?) is the earliest version
>> that 5.x supports.  I don't think you can upgrade directly from 3.x to 5.x
>> without an intermediate stop at some version of 4.x can you?  So when we
>> get to 6.x we won't need the 4 -> 5 conversion code because 6 will only
>> support reading from  5.  If I am incorrect and we expect a version to be
>> able to read 2 major versions back then indeed the deprecated since would
>> be 2 major versions ahead of the version when the code was written.
>>
>> On Tue, Oct 31, 2023 at 2:40 PM Andrew Weaver 
>> wrote:
>>
>>> Skipping versions on upgrade is absolutely something that happens in the
>>> real world.  This is particularly highlighted by the discussion around
>>> 5.0/5.1 that's been happening - 5.0 has been described as a potential
>>> "ghost version" which I completely understand.
>>>
>>> Getting rid of some of the old cruft that seems unnecessary (and
>>> strictly speaking is unnecessary) is not without its downsides.  In this
>>> case, that cruft improves the user experience.
>>>
>>> On Tue, Oct 31, 2023 at 5:56 AM Miklosovic, Stefan via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> Do I understand it correctly that this is basically the case of
>>>> "deprecated on introduction" as we know that it will not be necessary the
>>>> very next version?
>>>>
>>>> I think that not everybody is upgrading from version to version as they
>>>> appear. If somebody upgrades from 4.0 to 5.1 (which we seem to support) (1)
>>>> and you would have introduced the deprecation in 4.0 with intention to
>>>> remove it in 5.0 and somebody jumps to 5.1 straight away, is not that a
>>>> problem? Because they have not made the move via 5.0 where you upgrade
>>>> logic was triggered.
>>>>
>>>> (1)
>>>> https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTestBase.java#L97-L108
>>>>
>>>> 
>>>> From: Claude Warren, Jr via dev 
>>>> Sent: Tuesday, October 31, 2023 10:57
>>>> To: dev
>>>> Cc: Claude Warren, Jr
>>>> Subject: Immediately Deprecated Code
>>>>
>>>> NetApp Security WARNING: This is an external email. Do not click links
>>>> or open attachments unless you recognize the sender and know the content is
>>>> safe.
>>>>
>>>>
>>>>
>>>> I was thinking about code that is used to migrate from one version to
>>>> another.  For example the code that rewrote the order of the hash values
>>>> used for Bloom filters.  That code was necessary for the version it was
>>>> coded in.  But the next version does not need that code because the next
>>>> version is not going to read the data from 2 versions prior to itself.  So
>>>> the code could be removed for verson+1.
>>>>
>>>> So, would it have made sense to annotate those methods (and variables)
>>>> as deprecated since the version they were written in so the
>>>> methods/variables can be removed in the next version?
>>>>
>>>> If so, what I propose is that all transitional methods and variable be
>>>> marked as deprecated with the version in which they were introduced.
>>>>
>>>>
>>>
>>> --
>>> Andrew Weaver
>>>
>>

Re: [DISCUSS] CASSANDRA-19104: Standardize tablestats formatting and data units

2023-12-04 Thread Claude Warren, Jr via dev

Why not change the option so that -H will operate as it does now while -Hn
(where n is a digit) will limit the number of decimal places to n.

On Mon, Dec 4, 2023 at 5:11 PM Brad  wrote:

> Thanks, Jacek.  Using three significant digits for disk space is a good
> suggestion.
>
> On Mon, Dec 4, 2023 at 9:58 AM Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
>> This looks great,
>>
>> I'd consider limiting the number of significant digits to 3 in the human
>> readable format. In the above example it would translate to:
>>
>> Space used (live): 1.46 TiB
>> Space used (total): 1.46 TiB
>>
>> Bytes repaired: 0.00 KiB
>> Bytes unrepaired: 4.31 TiB
>> Bytes pending repair: 0.000 KiB
>>
>> I just think with human readable format we just expect to have a grasp
>> view of the stats and 4th significant digit has very little meaning in that
>> case.
>>
>>
>> thanks,
>> Jacek
>>
>>

Re: Custom FSError and CommitLog Error Handling

2023-12-12 Thread Claude Warren, Jr via dev

I can see this as a strong improvement in Cassandra management and support
it.

+1 non binding

On Mon, Dec 11, 2023 at 8:28 PM Raymond Huffman 
wrote:

> Hello All,
>
> On our fork of Cassandra, we've implemented some custom behavior for
> handling CommitLog and SSTable Corruption errors. Specifically, if a node
> detects one of those errors, we want the node to stop itself, and if the
> node is restarted, we want initialization to fail. This is useful in
> Kubernetes when you expect nodes to be restarted frequently and makes our
> corruption remediation workflows less error-prone. I think we could make
> this behavior more pluggable by allowing users to provide custom
> implementations of the FSErrorHandler, and the error handler that's
> currently implemented at
> org.apache.cassandra.db.commitlog.CommitLog#handleCommitError via config in
> the same way one can provide custom Partitioners and
> Authenticators/Authorizers.
>
> Would you take as a contribution one of the following?
> 1. user provided implementations of FSErrorHandler and
> CommitLogErrorHandler, set via config; and/or
> 2. new commit failure and disk failure policies that write a poison pill
> file to disk and fail on startup if that file exists
>
> The poison pill implementation is what we currently use - we call this a
> "Non Transient Error" and we want these states to always require manual
> intervention to resolve, including manual action to clear the error. I'd be
> happy to contribute this if other users would find it beneficial. I had
> initially shared this question in Slack, but I'm now sharing it here for
> broader visibility.
>
> -Raymond Huffman
>

[DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-14 Thread Claude Warren, Jr via dev

Greetings,

I have submitted a pull request[1] that replaces the unsupported Sigar
library with the maintained OSHI library.

OSHI is an MIT licensed library that provides information about the
underlying OS much like Sigar did.

The change adds a dependency on oshi-core at the following coordinates:

com.github.oshi
 oshi-core
 6.4.6

In addition to switching to a supported library, this change will reduce
the size of the package as the native Sigar libraries are removed.

Are there objections to making this switch and adding a new dependency?

[1] https://github.com/apache/cassandra/pull/2842/files
[2] https://issues.apache.org/jira/browse/CASSANDRA-16565

Re: [DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-17 Thread Claude Warren, Jr via dev

Can I get an another review/approval for the pull request?
https://github.com/apache/cassandra/pull/2842/files

On Fri, Dec 15, 2023 at 4:04 AM guo Maxwell  wrote:

> +1 too
>
> Mick Semb Wever  于2023年12月15日周五 10:01写道：
>
>>
>>
>>
>>>
>>> Are there objections to making this switch and adding a new dependency?
>>>
>>> [1] https://github.com/apache/cassandra/pull/2842/files
>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-16565
>>>
>>
>>
>>
>> +1 to removing sigar and to add oshi-core
>>
>>

Re: [DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-18 Thread Claude Warren, Jr via dev

The pull request is : https://github.com/apache/cassandra/pull/2842

On Mon, Dec 18, 2023 at 10:26 AM Mick Semb Wever  wrote:

>
>
> Can I get an another review/approval for the pull request?
>> https://github.com/apache/cassandra/pull/2842/files
>>
>
>
> It is not clear on the ticket what is being finally proposed, or what is
> to be reviewed, ref: https://github.com/Claudenw/cassandra/pull/6
> Keeping the ticket up to date makes this easier.
>
> btw, I am in the process of reviewing
> https://github.com/apache/cassandra/compare/trunk...instaclustr:cassandra:CASSANDRA-16565-review
>
>

Re: Call for Presentations closing soon: Community over Code EU 2024

2024-01-09 Thread Claude Warren, Jr via dev

Additionally, if you have a talk about some underlying technology that
could be applicable across multiple projects submit it or a poster based on
it.  We are looking for good cross-project presentations.

Claude
Chair, Community over Code, EU 2024.

On Mon, Jan 8, 2024 at 8:24 PM Paulo Motta  wrote:

> I wanted to remind that the call for speakers for Community Over Code EU
> 2024 (formerly Apachecon EU) will be closing this Friday 2024/01/12
> 23:59:59 GMT.
>
> If you reside in Europe/EMEA and have an interesting talk proposal about
> using, deploying or modifying Apache Cassandra please see details below to
> submit a proposal to this conference.
>
> -- Forwarded message -
> From: Ryan Skraba 
> Date: Mon, Oct 30, 2023 at 1:07 PM
> Subject: Call for Presentations now open: Community over Code EU 2024
> To:
>
>
> (Note: You are receiving this because you are subscribed to the dev@
> list for one or more projects of the Apache Software Foundation.)
>
> It's back *and* it's new!
>
> We're excited to announce that the first edition of Community over
> Code Europe (formerly known as ApacheCon EU) which will be held at the
> Radisson Blu Carlton Hotel in Bratislava, Slovakia from June 03-05,
> 2024! This eagerly anticipated event will be our first live EU
> conference since 2019.
>
> The Call for Presentations (CFP) for Community Over Code EU 2024 is
> now open at https://eu.communityovercode.org/blog/cfp-open/,
> and will close 2024/01/12 23:59:59 GMT.
>
> We welcome submissions on any topic related to the Apache Software
> Foundation, Apache projects, or the communities around those projects.
> We are specifically looking for presentations in the following
> categories:
>
> * API & Microservices
> * Big Data Compute
> * Big Data Storage
> * Cassandra
> * CloudStack
> * Community
> * Data Engineering
> * Fintech
> * Groovy
> * Incubator
> * IoT
> * Performance Engineering
> * Search
> * Tomcat, Httpd and other servers
>
> Additionally, we are thrilled to introduce a new feature this year: a
> poster session. This addition will provide an excellent platform for
> showcasing high-level projects and incubator initiatives in a visually
> engaging manner. We believe this will foster lively discussions and
> facilitate networking opportunities among participants.
>
> All my best, and thanks so much for your participation,
>
> Ryan Skraba (on behalf of the program committee)
>
> [Countdown]:
> https://www.timeanddate.com/countdown/to?iso=20240112T2359&p0=1440
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@community.apache.org
> For additional commands, e-mail: dev-h...@community.apache.org
>
>

[DISCUSS] Update cassandra-stress to use Apache Commons CLI (CASSANDRA-18661)

2024-03-08 Thread Claude Warren, Jr via dev

I have been working on CASSANDRA-18661 to see if it is possible to migrate
to the Apache commons-cli as noted in the ticket.  It is possible to do so,
and after several pull requests to commons-cli, I have managed to migrate
the settings of the stress tool.  We will have to wait for commons-cli
1.7.0 to be released before we can merge this change into the trunk.
However, I thought it expedient to discuss a few items before that pull
request is made.


   1. The original options were serializable the apache-cli based ones are
   not.  I think this was to support thrift and is no longer necessary.  I
   would like to remove the Serializable interface from the options.  If this
   is not possible, extra work will need to be done to ensure that
   serialization will work.
   2. There has been some discussion on the ticket about legacy scripts
   using the original options making this a breaking change.  It should be
   possible to deploy the same functional stress test with the two different
   command line interfaces.  This will take some extra work.  The net result
   is that the changes are only in the command line and do not impact the
   operation once the options have been set.

Finally, there were few tests that showed what the various options were
doing.  I have written tests for all the new settings implementations.  I
think that we we keep 2 interfaces it would be possible and advisable to
modify the command line tests to support both CLIs.

If you are interested to see what the new CLI options are, I have pasted
the output of a help command into the jira ticket.

Thank you for your time,
Claude

Patently invalid Compression parameters in CompressedSequentialWriterTest

2024-03-15 Thread Claude Warren, Jr via dev

I have been working at cleaning up the Yaml configuration for default table
compression settings and found that the CompressedSequentialWriterTest uses
some parameters that are outside the acceptable limits (like bufferLength
not a power of 2, or maxCompressedLength > bufferLength).

I can understand that these settings were used to set some extreme
condition, but shouldn't they be within the max/min ranges?

Is it also possible that the tests are not testing what we think they are
because of the parameter issue.

I have left the code path in place that allows creation of invalid
parameters but I am concerned.

Claude

Default table compression defined in yaml.

2024-03-18 Thread Claude Warren, Jr via dev

After much work by several people, I have pulled together the changes to
define the default compression in the cassandra.yaml file and have created
a pull request [1].

If you are interested this in topic, please take a look at the changes and
give at least a cursory review.

[1]  https://github.com/apache/cassandra/pull/3168

Thanks,
Claude

Re: Default table compression defined in yaml.

2024-03-19 Thread Claude Warren, Jr via dev

The earlier request was to be able to take the CQL statement and (with very
little modification) use it in the YAML.  While I agree that introducing
the new setting to remove it later is a bit wonky, it is necessary to
support the current CQL statement.  Unless the CQL statement has changed
already.

On Tue, Mar 19, 2024 at 10:52 AM Bowen Song via dev <
dev@cassandra.apache.org> wrote:

> I believe the `foobar_in_kb: 123` format in the cassandra.yaml file is
> deprecated, and the new format is `foobar: 123KiB`. Is there a need to
> introduce new settings entries with the deprecated format only to be
> removed at a later version?
>
>
> On 18/03/2024 14:39, Claude Warren, Jr via dev wrote:
>
> After much work by several people, I have pulled together the changes to
> define the default compression in the cassandra.yaml file and have created
> a pull request [1].
>
> If you are interested this in topic, please take a look at the changes and
> give at least a cursory review.
>
> [1]  https://github.com/apache/cassandra/pull/3168
>
> Thanks,
> Claude
>
>

Re: Default table compression defined in yaml.

2024-03-19 Thread Claude Warren, Jr via dev

We can not support both the "rule" that new settings have the new format,
and the design goal that the CQL statement format be accepted.

We came to a compromise.
We introduced the new chunk_length parameter that requires a DataStorageSpec
We reused the CQL chunk_length_in_kb parameter to accept the CQL format.

I think this is a reasonable compromise.  We could emphasize the
chunk_length parameter in documentation changes and leave the
chunk_length_in_kb parameter to a discussion of converting CQL to YAML
configuration.
We could put in a log message that recommends the correct chunk_length
parameter based on chunk_length_in_kb value.  But I do not see a way to
support both requirements for the new format and the CQL format support.

We can deprecate the chunk_length_in_kb as soon as CQL changes to use a
DataStorageSpec for its parameter, but I do not know of any proposals to
change CQL.

On Tue, Mar 19, 2024 at 1:09 PM Ekaterina Dimitrova 
wrote:

> Any new settings are expected to be added in the new format
>
> On Tue, 19 Mar 2024 at 5:52, Bowen Song via dev 
> wrote:
>
>> I believe the `foobar_in_kb: 123` format in the cassandra.yaml file is
>> deprecated, and the new format is `foobar: 123KiB`. Is there a need to
>> introduce new settings entries with the deprecated format only to be
>> removed at a later version?
>>
>>
>> On 18/03/2024 14:39, Claude Warren, Jr via dev wrote:
>>
>> After much work by several people, I have pulled together the changes to
>> define the default compression in the cassandra.yaml file and have created
>> a pull request [1].
>>
>> If you are interested this in topic, please take a look at the changes
>> and give at least a cursory review.
>>
>> [1]  https://github.com/apache/cassandra/pull/3168
>>
>> Thanks,
>> Claude
>>
>>

Re: Default table compression defined in yaml.

2024-03-21 Thread Claude Warren, Jr via dev

Jacek,

I am a bit confused here.  I find a key for "sstable" in the yaml but it is
commented out by default.  There are a number of options under it that are
commented out and then one that is not and then the "default_compaction"
section, which I assume is supposed to apply to the "sstable" section.  Are
you saying that the "sstable_compression" section that we introduced should
be placed as a child to the "sstable" key (and probably renamed to
default_compression"?

I have included the keys from the trunk yaml below with non-key comments
excluded.  The way I read it either the "sstable" key is not required and a
user can just uncomment "column_index_size"; or "column_index_cache_size"
is not really used because it would be under
"sstable/column_index_cache_size" in the Config; or the "sstable:" is only
intended to be a visual break / section for the human editor.

Can you or someone clarify this form me?

#sstable:
#  selected_format: big
# column_index_size: 4KiB
column_index_cache_size: 2KiB
# default_compaction:
#   class_name: SizeTieredCompactionStrategy
#   parameters:
# min_threshold: 4
# max_threshold: 32

On Wed, Mar 20, 2024 at 10:31 PM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Compression params for sstables should be under the "sstable" key.
>
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> wt., 19 mar 2024 o 13:10 Ekaterina Dimitrova 
> napisał(a):
>
>> Any new settings are expected to be added in the new format
>>
>> On Tue, 19 Mar 2024 at 5:52, Bowen Song via dev 
>> wrote:
>>
>>> I believe the `foobar_in_kb: 123` format in the cassandra.yaml file is
>>> deprecated, and the new format is `foobar: 123KiB`. Is there a need to
>>> introduce new settings entries with the deprecated format only to be
>>> removed at a later version?
>>>
>>>
>>> On 18/03/2024 14:39, Claude Warren, Jr via dev wrote:
>>>
>>> After much work by several people, I have pulled together the changes to
>>> define the default compression in the cassandra.yaml file and have created
>>> a pull request [1].
>>>
>>> If you are interested this in topic, please take a look at the changes
>>> and give at least a cursory review.
>>>
>>> [1]  https://github.com/apache/cassandra/pull/3168
>>>
>>> Thanks,
>>> Claude
>>>
>>>

Re: Default table compression defined in yaml.

2024-03-21 Thread Claude Warren, Jr via dev

The text I posted above is directly from the yaml.  Is it intended that
"sstable:" is to be first segment of the yaml key for
"default_compaction"?  If so, it won't because column_index_cache starts in
the first column.

I am happy to move the new configuration section, but I don't follow how
this is to work.



On Thu, Mar 21, 2024 at 1:23 PM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Only indented items below "sstable" belong to "sstable". It is commented
> out by default to make it clear that it is not required and the default
> values apply.
>
> There are a number of sstable parameters which are historically spread
> across the yaml with no structure. The point is that we should not add to
> that mess and try to group the new stuff.
>
> "default_compression" under ""sstable" key sounds good to me.
>
> - - -- --- -  -
> Jacek Lewandowski
>
>
> czw., 21 mar 2024 o 08:32 Claude Warren, Jr via dev <
> dev@cassandra.apache.org> napisał(a):
>
>> Jacek,
>>
>> I am a bit confused here.  I find a key for "sstable" in the yaml but it
>> is commented out by default.  There are a number of options under it that
>> are commented out and then one that is not and then the
>> "default_compaction" section, which I assume is supposed to apply to the
>> "sstable" section.  Are you saying that the "sstable_compression" section
>> that we introduced should be placed as a child to the "sstable" key (and
>> probably renamed to default_compression"?
>>
>> I have included the keys from the trunk yaml below with non-key comments
>> excluded.  The way I read it either the "sstable" key is not required and a
>> user can just uncomment "column_index_size"; or "column_index_cache_size"
>> is not really used because it would be under
>> "sstable/column_index_cache_size" in the Config; or the "sstable:" is only
>> intended to be a visual break / section for the human editor.
>>
>> Can you or someone clarify this form me?
>>
>> #sstable:
>> #  selected_format: big
>> # column_index_size: 4KiB
>> column_index_cache_size: 2KiB
>> # default_compaction:
>> #   class_name: SizeTieredCompactionStrategy
>> #   parameters:
>> # min_threshold: 4
>> # max_threshold: 32
>>
>> On Wed, Mar 20, 2024 at 10:31 PM Jacek Lewandowski <
>> lewandowski.ja...@gmail.com> wrote:
>>
>>> Compression params for sstables should be under the "sstable" key.
>>>
>>>
>>> - - -- --- -  -
>>> Jacek Lewandowski
>>>
>>>
>>> wt., 19 mar 2024 o 13:10 Ekaterina Dimitrova 
>>> napisał(a):
>>>
>>>> Any new settings are expected to be added in the new format
>>>>
>>>> On Tue, 19 Mar 2024 at 5:52, Bowen Song via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>>> I believe the `foobar_in_kb: 123` format in the cassandra.yaml file is
>>>>> deprecated, and the new format is `foobar: 123KiB`. Is there a need to
>>>>> introduce new settings entries with the deprecated format only to be
>>>>> removed at a later version?
>>>>>
>>>>>
>>>>> On 18/03/2024 14:39, Claude Warren, Jr via dev wrote:
>>>>>
>>>>> After much work by several people, I have pulled together the changes
>>>>> to define the default compression in the cassandra.yaml file and have
>>>>> created a pull request [1].
>>>>>
>>>>> If you are interested this in topic, please take a look at the changes
>>>>> and give at least a cursory review.
>>>>>
>>>>> [1]  https://github.com/apache/cassandra/pull/3168
>>>>>
>>>>> Thanks,
>>>>> Claude
>>>>>
>>>>>

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Claude Warren, Jr via dev

I think this solution would solve one of the problems that Aiven has with
node replacement currently.  Though TCM will probably help as well.

On Mon, Apr 15, 2024 at 11:47 PM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Thanks for the proposal. I second Jordan that we need more abstraction in
> (1), e.g. most cloud provider allow for disk snapshots and starting nodes
> from a snapshot which would be a good mechanism if you find yourself there.
>
> German
> --
> *From:* Jordan West 
> *Sent:* Sunday, April 14, 2024 12:27 PM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra
> Sidecar for Live Migrating Instances
>
> Thanks for proposing this CEP! We have something like this internally so I
> have some familiarity with the approach and the challenges. After reading
> the CEP a couple things come to mind:
>
> 1. I would like to see more abstraction of how the files get moved / put
> in place with the proposed solution being the default implementation. That
> would allow others to plug in alternatives means of data movement like
> pulling down backups from S3 or rsync, etc.
>
> 2. I do agree with Jon’s last email that the lifecycle / orchestration
> portion is the more challenging aspect. It would be nice to address that as
> well so we don’t end up with something like repair where the building
> blocks are there but the hard parts are left to the operator. I do,
> however, see that portion being done in a follow-on CEP to limit the scope
> of CEP-40 and have a higher chance for success by incrementally adding
> these features.
>
> Jordan
>
> On Thu, Apr 11, 2024 at 12:31 Jon Haddad  wrote:
>
> First off, let me apologize for my initial reply, it came off harsher than
> I had intended.
>
> I know I didn't say it initially, but I like the idea of making it easier
> to replace a node.  I think it's probably not obvious to folks that you can
> use rsync (with stunnel, or alternatively rclone), and for a lot of teams
> it's intimidating to do so.  Whether it actually is easy or not to do with
> rsync is irrelevant.  Having tooling that does it right is better than duct
> taping things together.
>
> So with that said, if you're looking to get feedback on how to make the
> CEP more generally useful, I have a couple thoughts.
>
> > Managing the Cassandra processes like bringing them up or down while
> migrating the instances.
>
> Maybe I missed this, but I thought we already had support for managing the
> C* lifecycle with the sidecar?  Maybe I'm misremembering.  It seems to me
> that adding the ability to make this entire workflow self managed would be
> the biggest win, because having a live migrate *feature* instead of what's
> essentially a runbook would be far more useful.
>
> > To verify whether the desired file set matches with source, only file
> path and size is considered at the moment. Strict binary level verification
> is deferred for later.
>
> Scott already mentioned this is a problem and I agree, we cannot simply
> rely on file path and size.
>
> TL;DR: I like the intention of the CEP.  I think it would be better if it
> managed the entire lifecycle of the migration, but you might not have an
> appetite to implement all that.
>
> Jon
>
>
> On Thu, Apr 11, 2024 at 10:01 AM Venkata Hari Krishna Nukala <
> n.v.harikrishna.apa...@gmail.com> wrote:
>
> Thanks Jon & Scott for taking time to go through this CEP and providing
> inputs.
>
> I am completely with what Scott had mentioned earlier (I would have added
> more details into the CEP). Adding a few more points to the same.
>
> Having a solution with Sidecar can make the migration easy without
> depending on rsync. At least in the cases I have seen, rsync is not enabled
> by default and most of them want to run OS/images with as minimal
> requirements as possible. Installing rsync requires admin privileges and
> syncing data is a manual operation. If an API is provided with Sidecar,
> then tooling can be built around it reducing the scope for manual errors.
>
> From performance wise, at least in the cases I had seen, the File
> Streaming API in Sidecar performs a lot better. To give an idea on the
> performance, I would like to quote "up to 7 Gbps/instance writes (depending
> on hardware)" from CEP-28 as this CEP proposes to leverage the same.
>
> For:
>
> >When enabled for LCS, single sstable uplevel will mutate only the level
> of an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level of an SSTable on the source via single sstable uplevel
> may not be caught by a digest based only on filename and length.
>
> In this case file size may not change, but the timestamp of last modified
> time would change, right? It is addressed in section MIGRATING ONE
> INSTANCE, point 2.b.ii which says "If a file is present at the destination
> but did not m

Re: discuss: add to_human_size function

2024-04-19 Thread Claude Warren, Jr via dev

I like the idea.  Is the intention to have the of the function be parsable
by the config  parsers like DataRateSpec, DataStorageSpec, or DurationSpec?

Claude

On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg  wrote:

> Hi,
>
> I think it’s a good quality of life improvement, but I am someone who
> believes in a rich set of built-in functions being a good thing.
>
> A format function is a bit more scope and kind of orthogonal. It would
> still be good to have shorthand functions for things like size.
>
> Ariel
>
> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>
> Hi,
>
> I want to propose CASSANDRA-19546. It would be possible to convert raw
> numbers to something human-friendly.
> There are cases when we write just a number of bytes in our system tables
> but these numbers are just hard to parse visually. Users can indeed use
> this for their tables too if they find it useful.
>
> Also, a user can indeed write a UDF for this but I would prefer if we had
> something baked in.
>
> Does this make sense to people? Are there any other approaches to do this?
>
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
>
> Regards
>
>
>

Re: discuss: add to_human_size function

2024-04-25 Thread Claude Warren, Jr via dev

TiB is not yet in DataStorageSpec (perhaps we should add it).

A quick review tells me that all the units are unique across the 3 specs.
As long as we guarantee that in the future the method you propose should be
easily expandable to the other specs.

+1 to this idea.

On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> That is a very interesting point, Claude. My so-far implementation is
> using FileUtils.stringifyFileSize which is just dividing a value by a
> respective divisor based on how big a value is. While this works, it will
> prevent you from specifying what unit you want that value to be converted
> to as well as it will prevent you from specifying what unit a value you
> provided is of. So, for example, if a column is known to be in kibibytes
> and we want that to be converted into gibibytes, that won't be possible
> because that function will think that a value is in bytes.
>
> It would be more appropriate to have something like this:
>
> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without any
> source nor target unit, it will consider it to be in bytes and it will
> convert it like in FileUtils.stringifyFileSize
>
> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>
> the first argument is the source unit, the second argument is target unit
>
> to_human_size(val, 'B', 'MiB')
> to_human_size(val, 'B', 'GiB')
> to_human_size(val, 'KiB', 'GiB')
> to_human_size(val, 'KiB', 'TiB')
>
> I think this is more flexible and we should funnel this via
> DataStorageSpec and similar as you mentioned.
>
> In the future, we might also add to_human_duration which would be
> implemented against DurationSpec so similar conversions are possible.
>
> On Fri, Apr 19, 2024 at 10:53 AM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
>> I like the idea.  Is the intention to have the of the function be
>> parsable by the config  parsers like DataRateSpec, DataStorageSpec, or
>> DurationSpec?
>>
>> Claude
>>
>> On Thu, Apr 18, 2024 at 9:47 PM Ariel Weisberg  wrote:
>>
>>> Hi,
>>>
>>> I think it’s a good quality of life improvement, but I am someone who
>>> believes in a rich set of built-in functions being a good thing.
>>>
>>> A format function is a bit more scope and kind of orthogonal. It would
>>> still be good to have shorthand functions for things like size.
>>>
>>> Ariel
>>>
>>> On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
>>>
>>> Hi,
>>>
>>> I want to propose CASSANDRA-19546. It would be possible to convert raw
>>> numbers to something human-friendly.
>>> There are cases when we write just a number of bytes in our system
>>> tables but these numbers are just hard to parse visually. Users can indeed
>>> use this for their tables too if they find it useful.
>>>
>>> Also, a user can indeed write a UDF for this but I would prefer if we
>>> had something baked in.
>>>
>>> Does this make sense to people? Are there any other approaches to do
>>> this?
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-19546
>>> https://github.com/apache/cassandra/pull/3239/files
>>>
>>> Regards
>>>
>>>
>>>

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-05-01 Thread Claude Warren, Jr via dev

Alex,

 you write:

> We can implement CEP-40 using a similar approach: we can leave the source
> node as both a read and write target, and allow the new node to be a target
> for (pending) writes. Unfortunately, this does not help with availability
> (in fact, it decreases write availability, since we will have to collect
> 2+1 mandatory write responses instead of just 2), but increases durability,
> and I think helps to fully eliminate the second phase. This also increases
> read availability when the source node is up, since we can still use the
> source node as a part of read quorum.


Would it be possible to create a new type of write target node?  The new
write target node is notified of writes (like any other write node) but
does not participate in the write availability calculation.  In this way a
node this is being migrated to could receive writes and have minimal impact
on the current operation of the cluster?

Claude



On Wed, May 1, 2024 at 12:33 PM Alex Petrov  wrote:

> Thank you for submitting this CEP!
>
> Wanted to discuss this point from the description:
>
> > How to bring up/down Cassandra/Sidecar instances or making/applying
> config changes are outside the scope of this document.
>
> One advantage of doing migration via sidecar is the fact that we can
> stream sstables to the target node from the source node while the source
> node is down. Also if the source node is down, it does not matter if we
> can’t use it as a write target However, if we are replacing a live node, we
> do lose both durability and availability during the second copy phase.
> There are copious other advantages described by others in the thread above.
>
> For example, we have three adjacent nodes A,B,C and simple RF 3. C
> (source) is up and is being replaced with live-migrated D (destination).
> According to the described process in CEP-40, we perform streaming in 2
> phases: first one is a full copy (similar to bootstrap/replacement in
> cassandra), and the second one is just a diff. The second phase is still
> going to take a non-trivial amount of time, and is likely to last at very
> least minutes. During this time, we only have nodes A and B as both read
> and write targets, with no alternatives: we have to have both of them
> present for any operation, and losing either one of them leaves us with
> only one copy of data.
>
> To contrast this, TCM bootstrap process is 4-step: between the old owner
> being phased out and the new owner brought in, we always ensure r/w quorum
> consistency and liveness of at least 2 nodes for the read quorum, 3 nodes
> available for reads in best case, and 2+1 pending replica for the write
> quorum, with 4 nodes (3 existing owners + 1 pending) being available for
> writes in best case. Replacement in TCM is implemented similarly, with the
> old node remaining an (unavailable) read target, but new node already being
> the target for (pending) writes.
>
> We can implement CEP-40 using a similar approach: we can leave the source
> node as both a read and write target, and allow the new node to be a target
> for (pending) writes. Unfortunately, this does not help with availability
> (in fact, it decreases write availability, since we will have to collect
> 2+1 mandatory write responses instead of just 2), but increases durability,
> and I think helps to fully eliminate the second phase. This also increases
> read availability when the source node is up, since we can still use the
> source node as a part of read quorum.
>
> I think if we want to call this feature "live migration", since this term
> is used in hypervisor community to describe an instant and uninterrupted
> instance migration from one host to the other without guest instance being
> able to notice as much as the time jump, we may want to provide similar
> guarantees.
>
> I am also not against to have this to be done post-factum, after
> implementation of CEP in its current form, but I think it would be good to
> have good understanding of availability and durability guarantees we want
> to provide with it, and have it stated explicitly, for both "source node
> down" and "source node up" cases. That said, since we will have to
> integrate CEP-40 with TCM, and will have to ensure correctness of sstable
> diffing for the second phase, it might make sense to consider reusing some
> of the existing replacement logic from TCM. Just to make sure this is
> mentioned explicitly, my proposal is only concerned with the second copy
> phase, without any implications about the first.
>
> Thank you,
> --Alex
>
> On Fri, Apr 5, 2024, at 12:46 PM, Venkata Hari Krishna Nukala wrote:
>
> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. De

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Claude Warren, Jr via dev

>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );


Are these not just "CONSTRAINT inList([List of valid values], field);"  and
"CONSTRAINT not inList([List of valid values], field);"?
At this point doesn't "CONSTRAINT p1 != p2" devolve to "CONSTRAINT not
inList([p1], p2);"?

Can "[List of values]" point to a variable containing a list?  Or does it
require hard coding in the constraint itself?



On Tue, Jun 11, 2024 at 6:23 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Hi Štephan
>
> I'll address the different points:
> 1)
> An example (possibly a stretch) of use case for != constraint would be:
> Let's say you have a table in which you want to record a movement, from
> position p1 to position p2. You may want to check that those two are
> different to make sure there is actual movement.
>
> CREATE TABLE keyspace.table (
>   p1 int,
>   p2 int,
>   ...,
>   CONSTRAINT p1 != p2
> );
>
> For the case of ==, I agree that it is harder to come up with a valid use
> case, and I added it for completion.
>
> 2)
> Is part of an enum is somehow suplying the lack of enum types. Constraint
> could be something like CONSTRAINT belongsToEnum([list of valid values],
> field):
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT belongsToEnum(['foo', 'foo2'], field),
>   ...
> );
>
> 3)
> Similarly, we can check and reject if a term is part of a list of blocked
> terms:
> CREATE TABLE keyspace.table (
>   field text CONSTRAINT isNotBlocked(['blocked_foo', 'blocked_foo2'],
> field),
>   ...
> );
>
> Please let me know if this helps,
> Bernardo
>
>
>
> On Jun 11, 2024, at 6:29 AM, Štefan Miklošovič <
> stefan.mikloso...@gmail.com> wrote:
>
> Hi Bernardo,
>
> 1) Could you elaborate on these two constraints?
>
> == and != ?
>
> What is the use case? Why would I want to have data in a database stored
> in some column which would need to be _same as my constraint_ and which
> _could not_ be same as my constraint? Can you give me at least one example
> of each? It looks like I am going to put a constant into a database in case
> of ==, wouldn't a static column be better?
>
> 2) For examples of text based types you mentioned: "is part of an enum" -
> how would you enforce this in Cassandra? What enum do we have in CQL?
> 3) What does "is it block listed" mean?
>
> In the meanwhile, I made changes to CEP-24 to move transactionality into
> optional features.
>
> On Tue, Jun 11, 2024 at 12:18 AM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Hi everyone,
>>
>> After the feedback, I'd like to make a recap of what we have discussed in
>> this thread and try to move forward with the conversation.
>>
>> I made some clarifications:
>> - Constraints are only applied at write time.
>> - Guardrail configurations should maintain preference over what's being
>> defined as a constraint.
>>
>> *Specify constraints:*
>> There is a general feedback around adding more concrete examples than the
>> ones that can be found on the CEP document.
>> Basically, the initial constraints I am proposing are:
>> - SizeOf Constraint for String types, as in
>> name text CONSTRAINT sizeOf(name) < 256
>>
>> - Value Constraint for numeric types
>> number_of_items int CONSTRAINT number_of_items < 1000
>>
>> Those two alone and combined provide a lot of flexibility, and allow
>> complex validations that enable "new types" such as:
>>
>> CREATE TYPE keyspace.cidr_address_ipv4 (
>>   ip_adress inet,
>>   subnet_mask int,
>>   CONSTRAINT subnet_mask > 0,
>>   CONSTRAINT subnet_mask < 32
>> )
>>
>> CREATE TYPE keyspace.color (
>>   r int,
>>   g int,
>>   b int,
>>   CONSTRAINT r >= 0,
>>   CONSTRAINT r < 255,
>>   CONSTRAINT g >= 0,
>>   CONSTRAINT g < 255,
>>   CONSTRAINT b >= 0,
>>   CONSTRAINT b < 255,
>> )
>>
>>
>> Those two initial Constraints are de fundamental constraints that would
>> give value to the feature. The framework can (and will) be extended with
>> other Constraints, leaving us with the following:
>>
>> For numeric types:
>> - Max (<)
>> - Min (>)
>> - Equality ( = = )
>> - Difference (!=)
>>
>> For date types:
>> - Before (<)
>> - After (>)
>>
>> For text based types:
>> - Size (sizeOf)
>> - isJson (is the text a json?)
>> - complies with a given pattern
>> - Is it block listed?
>> - Is it part of an enum?
>>
>> General table constraints (including more than one column):
>> - Compare between numeric types (a < b, a > b, a != b, …)
>> - Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
>>
>> I have updated the CEP with this information.
>>
>> *Potential

Re: [DISCUSS] Stream Pipelines on hot paths

2024-06-13 Thread Claude Warren, Jr via dev

I brought this topic to commons-collections because we use some streaming
in the Bloom filter implementation where we are very sensitive to
processing time.  I received this answer over there and thought I would
bring the information here:

You need to test it with some realistic data for a benchmark.
>
> In Commons Statistics we have a case where all elements of an array are
> passed to a consumer. So you have either:
>
> int[] a = ...
> IntConsumer c = ...
>
> Arrays.stream(a).forEach(c::accept)
>
> vs
>
> for (final int i : a) {
> c.accept(i);
> }
>
> When the array is less than 10 in length then the stream is noticeably
> slower. When it is longer then the difference becomes mute as the cost of a
> Stream is more of a constant than a scaling factor. It also depends on the
> speed of the consumer as to whether you will notice. But (single threaded)
> the loop is never slower. The stream provides the advantage of intermediate
> state operations on the stream contents, and parallelisation. If you are
> not using these then the stream is extra overhead.
>
> So the questions are: how large is your collection that you are streaming;
> what is consuming the stream; and how critical is the runtime performance
> for this process?
>

Link to discussion:
https://lists.apache.org/thread/1wlz9xt49n8o8rfkp8lrzy3fmpzsyqnn

Claude


On Fri, Jun 7, 2024 at 6:54 PM Jordan West  wrote:

> Agreed Aleksey. I wouldn’t be opposed to more nuanced use but the burden
> that adds seems impractical. A simple rule is easier.
>
> Jordan
>
> On Fri, Jun 7, 2024 at 05:59 Aleksey Yeshchenko  wrote:
>
>> It am okay with its use off hot paths in principle, and I’ve done it
>> myself.
>>
>> But as others have mentioned, it’s not obvious to every contributor what
>> is and isn’t a hot path. Also, the codebase is a living, shifting thing: a
>> cold path today can suddenly become hot tomorrow - it’s not uncommon.
>>
>> Another benefit to this binary decision flow is that we can easily
>> enforce it with our lint tooling just for non-test part of the codebase.
>> It’s just easier to scale.
>>
>>
>> On 7 Jun 2024, at 10:27, Štefan Miklošovič 
>> wrote:
>>
>> I think it makes sense to use streams to make the life easier for a dev
>> when constructing some log messages or something like that in clearly not
>> hot paths. Nothing wrong with that ... Collectors.joining(", ") and that
>> kind of stuff. I do not think that doing this aggressively and "orthodoxly"
>> is necessary.
>>
>>
>>

Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-07-01 Thread Claude Warren, Jr via dev

Perhaps we should consider a Milestone release.  At least in some projects
this is a way to provide a test bed with known issues that will be
corrected before an RC.

On Sun, Jun 30, 2024 at 9:50 PM Jon Haddad  wrote:

> This came in after our vote, but we might also have a problem with
> performing schema changes after a full restart.  Appears to only be if the
> entire cluster was shut down, according to the report.  If it's true, this
> might affect anyone trying to restore from a backup.  This would also be a
> blocker for me, if that's the case.
>
> https://issues.apache.org/jira/browse/CASSANDRA-19735
>
> Jon
>
>
> On Thu, Jun 27, 2024 at 9:49 PM Jon Haddad  wrote:
>
>> Thanks for confirming this, Blake. I agree that we should not knowingly
>> ship new versions with severe bugs that cause the DB to crash, regression
>> or not.
>>
>> -1 from me as well
>>
>>
>> On Fri, Jun 28, 2024 at 1:39 AM Blake Eggleston 
>> wrote:
>>
>>> Looking at the ticket, I’d say Jon’s concern is legitimate. The
>>> segfaults Jon is seeing are probably caused by paxos V2 when combined with
>>> off heap memtables for the reason Benedict suggests in the JIRA. This
>>> problem will continue to exist in 5.0. Unfortunately, it looks like the
>>> patch posted is not enough to address the issue and will need to be a bit
>>> more involved to properly fix the problem.
>>>
>>> While this is not a regression, I think Jon’s point about trie memtables
>>> increasing usage of off heap memtables is a good one, and anyway we
>>> shouldn’t be doing major releases with known process crashing bugs.
>>>
>>> So I’m voting -1 on this release and will work with Jon and Benedict to
>>> get this fixed.
>>>
>>> Thanks,
>>>
>>> Blake
>>>
>>>
>>> On Jun 26, 2024, at 6:47 AM, Josh McKenzie  wrote:
>>>
>>> Blake or Benedict - can either of you speak to Jon's concerns around
>>> CASSANDRA-19668?
>>>
>>> On Wed, Jun 26, 2024, at 12:18 AM, Jeff Jirsa wrote:
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> On Jun 25, 2024, at 5:04 AM, Mick Semb Wever  wrote:
>>>
>>> 
>>>
>>> Proposing the test build of Cassandra 5.0-rc1 for release.
>>>
>>> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
>>> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
>>> Maven Artifacts:
>>> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
>>>
>>> The Source and Build Artifacts, and the Debian and RPM packages and
>>> repositories, are available here:
>>> https://dist.apache.org/repos/dist/dev/cassandra/5.0-rc1/
>>>
>>> The vote will be open for 72 hours (longer if needed). Everyone who has
>>> tested the build is invited to vote. Votes by PMC members are considered
>>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>>
>>> [1]: CHANGES.txt:
>>> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/CHANGES.txt
>>> [2]: NEWS.txt:
>>> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/NEWS.txt
>>>
>>>
>>>

Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-07-03 Thread Claude Warren, Jr via dev

Because you do not know if the issues stopping the release will require an
API change -- so is the API stable?

On Mon, Jul 1, 2024 at 3:02 PM Josh McKenzie  wrote:

> Perhaps we should consider a Milestone release.  At least in some projects
> this is a way to provide a test bed with known issues that will be
> corrected before an RC.
>
> How does that differ from beta in our lifecycle? API stable but a test bed
> to suss out issues like this.
>
>
> On Mon, Jul 1, 2024, at 9:30 AM, Claude Warren, Jr via dev wrote:
>
> Perhaps we should consider a Milestone release.  At least in some projects
> this is a way to provide a test bed with known issues that will be
> corrected before an RC.
>
> On Sun, Jun 30, 2024 at 9:50 PM Jon Haddad  wrote:
>
> This came in after our vote, but we might also have a problem with
> performing schema changes after a full restart.  Appears to only be if the
> entire cluster was shut down, according to the report.  If it's true, this
> might affect anyone trying to restore from a backup.  This would also be a
> blocker for me, if that's the case.
>
> https://issues.apache.org/jira/browse/CASSANDRA-19735
>
> Jon
>
>
> On Thu, Jun 27, 2024 at 9:49 PM Jon Haddad  wrote:
>
> Thanks for confirming this, Blake. I agree that we should not knowingly
> ship new versions with severe bugs that cause the DB to crash, regression
> or not.
>
> -1 from me as well
>
>
> On Fri, Jun 28, 2024 at 1:39 AM Blake Eggleston 
> wrote:
>
> Looking at the ticket, I’d say Jon’s concern is legitimate. The segfaults
> Jon is seeing are probably caused by paxos V2 when combined with off heap
> memtables for the reason Benedict suggests in the JIRA. This problem will
> continue to exist in 5.0. Unfortunately, it looks like the patch posted is
> not enough to address the issue and will need to be a bit more involved to
> properly fix the problem.
>
> While this is not a regression, I think Jon’s point about trie memtables
> increasing usage of off heap memtables is a good one, and anyway we
> shouldn’t be doing major releases with known process crashing bugs.
>
> So I’m voting -1 on this release and will work with Jon and Benedict to
> get this fixed.
>
> Thanks,
>
> Blake
>
>
> On Jun 26, 2024, at 6:47 AM, Josh McKenzie  wrote:
>
> Blake or Benedict - can either of you speak to Jon's concerns around
> CASSANDRA-19668?
>
> On Wed, Jun 26, 2024, at 12:18 AM, Jeff Jirsa wrote:
>
>
> +1
>
>
>
> On Jun 25, 2024, at 5:04 AM, Mick Semb Wever  wrote:
>
> 
>
> Proposing the test build of Cassandra 5.0-rc1 for release.
>
> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/5.0-rc1/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/CHANGES.txt
> [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/NEWS.txt
>
>
>

Re: [DISCUSS] Replace airlift/airline library with Picocli

2024-07-16 Thread Claude Warren, Jr via dev

There are several reasons to consider updating,  foremost in my mind is the
changes coming as part of CRA in Europe.  IANAL, but I don't think that
non-maintained code will meet the CRA requirements, nor will code
maintained by a single individual.

Our best approach may be to try to get picocli merged into commons-cli (I
have done a fair amount of work on commons-cli recently and would be
willing to assist in this effort) or get picocli under the ASF umbrella,
assuming that Remko wants to do either of these things.

I believe that subcommands would be a welcome addition to commons-cli.  An
annotation processor likewise may also be welcomed.

Claude


On Tue, Jul 16, 2024 at 10:14 AM Aleksey Yeshchenko 
wrote:

> Hi Maxim,
>
> I think I’m not fully sold on the need to do anything at all here. The
> library may no longer be maintained, but so what if it isn’t, really?
>
> Parsing command line arguments is a pretty well defined problem, it’s not
> the kind of code that rots and needs to be updated to stay operational. If
> it works now it will keep working.
>
> Why would we have to update it sooner or later?
>
> I might be missing something, of course, but what are our pain points with
> airlift/airline in its current state?
>
> —
> AY
>
> > On 16 Jul 2024, at 02:07, Remko Popma  wrote:
> >
> > Hi Maxim, thank you for letting me know of this discussion.
> >
> > Hello everyone,
> >
> > I developed and maintain picocli; let me try to address the concerns
> raised below.
> >
> > For background, I am on the PMC for Apache Logging Services (mostly
> involved with Log4j), and on the PMC for Apache Groovy.
> > My involvement in these projects is why I chose the Apache 2.0 license.
> Apache is close to my heart and I have no intention to switch to another
> license.
> >
> > The picocli documentation mentions it is possible to incorporate picocli
> in one’s project by copying a single source file. This is not meant as a
> recommendation (I should probably clarify this in the docs). Some
> people/projects have resistance to using an external dependency for command
> line parsing and I thought this would alleviate that concern and make it
> easier for picocli to gain more adoption.
> > If you were to select picocli for Cassandra, I would recommend adding it
> as an external dependency via Maven or Gradle.
> >
> > I hope this is useful.
> >
> > Warmly,
> > Remko Popma
> >
> >
> >
> > On 2024/07/15 18:53:47 Maxim Muzafarov wrote:
> >> Hello everyone,
> >>
> >>
> >> I want to continue the discussion that was originally started here
> >> [2], however, it's better to move it to a new thread with an
> >> appropriate title, so that everyone is aware of the replacement
> >> library we're trying to agree on.
> >>
> >> The question is:
> >> Does everyone agree with using Picocli as an airlift/airline
> >> replacement for our cli tools?
> >> The prototype to look at is here [1].
> >>
> >>
> >> The reasons are as follows:
> >>
> >> Why to replace?
> >>
> >> There are several cli tools that rely on the airlift/airline library
> >> to mark up the commands: NodeTool, JMXTool, FullQueryLogTool,
> >> CompactionStress (with the size of the NodeTool dominating the rest of
> >> the tools). The airline is no longer maintained, so we will have to
> >> update it sooner or later anyway.
> >>
> >>
> >> What criteria?
> >>
> >> Before we dive into the pros and cons of each candidate, I think we
> >> have to formulate criteria for the libraries we are considering, based
> >> on what we already have in the source code (from Cassandra's
> >> perspective). This in itself limits the libraries we can consider.
> >>
> >> Criteria can be as follows:
> >> - Library licensing, including risks that it may change in the future
> >> (the asf libs are the safest for us from this perspective);
> >> - Similarity of library design (to the airline). This means that the
> >> closer the libraries are, the easier it is to migrate to them, and the
> >> easier it is to get guarantees that we haven't broken anything. The
> >> further away the libraries are, the more extra code and testing we
> >> need;
> >> - Backward compatibility. The ideal case is where the user doesn't
> >> even notice that a different library is being used under the hood.
> >> This includes both the help output and command output.
> >>
> >> Of course, all libraries need to be known and well-maintained.
> >>
> >> What candidates?
> >>
> >>
> >> Picocli
> >> https://picocli.info/
> >>
> >> This is the well-known cli library under the Apache 2.0 license, which
> >> is similar to what we have in source code right now. This also means
> >> that the amount of changes (despite the number of the commands)
> >> required to migrate what we have is quite small.
> >> In particular, I would like to point out that:
> >> - It allows us to unbind the jmx-specific command options from the
> >> commands themselves, so that they can be reused in other APIs (my
> >> goal);
> >> - We can customize the help output so

Re: [DISCUSS] Replace airlift/airline library with Picocli

2024-07-17 Thread Claude Warren, Jr via dev

My CRA arguments basically revolve around the "Open Source Steward" from
the CRA.  As far as I recall, for open source software to be used in
commercial projects it must be maintained by a steward.  The definition of
steward is being discussed but foundations generally meet the requirement,
projects run by a single individual do not generally meet this
requirement.  Basically this section protects businesses from the "One
developer project in Nebraska" (see XKCD).

I think that moving forward picocli will probably move to a foundation or
will establish some sort of entity that meets the steward requirements.

So moving to picocli carries the risk of churn if we have to replace it
again later.  We should be asking picocli how they are going to deal with
the CRA requirements before we jump.

Claude

On Wed, Jul 17, 2024 at 2:42 AM Ariel Weisberg  wrote:

> This particular library doesn't really present transitive dependency
> issues. We already manually specify the version of the  two dependencies
> that look like the runtime dependencies.
>
> On Tue, Jul 16, 2024, at 11:39 AM, Jeff Jirsa wrote:
> > (Answering this as a cassandra person, don’t confuse this reply as
> > board/foundation guidance)
> >
> > The legal landscape is rapidly evolving with the CRA. The answer may
> > change in the future, but I think “we have a dependency we ship that’s
> > user-accessible and known to be abandoned” is an unhappy state that’s
> > indefensible if there’s ever a transitive dependency issue (e.g. log4j
> > style “oh who would have guessed that was possible” ).
> >
> > I’d look at this slightly differently - if someone proposed adding this
> > library right now, we’d all say no because it’s unmaintained. That
> > alone should be enough justification to fix the obvious problem - if
> > it’s unmaintained, let’s remove it before we’re doing it on fire.
> >
> >
> >
> >> On Jul 16, 2024, at 8:11 AM, Ariel Weisberg  wrote:
> >>
> >> Hi,
> >>
> >> I am pretty torn down the middle on this one. Unmaintained bad, but
> also Aleksey is right. If there are few/no dependencies in airline then it
> could be "done" given the narrow scope of what it does.
> >>
> >> It seems to depend on Guava, javax.inject, and findbugs. Seems like we
> can probably update the dependencies on our end if there is a security
> issue.
> >>
> >> Can we get clarity from Apache legal on whether simply being
> unmaintained will be something we are motivated to fix for legal reasons?
> >>
> >> If we have the code/prototype now (assuming we settle on Picocli), is
> the counter argument "churn bad" as strong? I don't anticipate changing
> from one to the other being too bad if someone else is footing the bill so
> to speak. Especially if Maxim is willing to take the lead on any bugs that
> we might uncover in the switch.
> >>
> >> Ariel
> >>
> >> On Mon, Jul 15, 2024, at 2:53 PM, Maxim Muzafarov wrote:
> >>> Hello everyone,
> >>>
> >>>
> >>> I want to continue the discussion that was originally started here
> >>> [2], however, it's better to move it to a new thread with an
> >>> appropriate title, so that everyone is aware of the replacement
> >>> library we're trying to agree on.
> >>>
> >>> The question is:
> >>> Does everyone agree with using Picocli as an airlift/airline
> >>> replacement for our cli tools?
> >>> The prototype to look at is here [1].
> >>>
> >>>
> >>> The reasons are as follows:
> >>>
> >>> Why to replace?
> >>>
> >>> There are several cli tools that rely on the airlift/airline library
> >>> to mark up the commands: NodeTool, JMXTool, FullQueryLogTool,
> >>> CompactionStress (with the size of the NodeTool dominating the rest of
> >>> the tools). The airline is no longer maintained, so we will have to
> >>> update it sooner or later anyway.
> >>>
> >>>
> >>> What criteria?
> >>>
> >>> Before we dive into the pros and cons of each candidate, I think we
> >>> have to formulate criteria for the libraries we are considering, based
> >>> on what we already have in the source code (from Cassandra's
> >>> perspective). This in itself limits the libraries we can consider.
> >>>
> >>> Criteria can be as follows:
> >>> - Library licensing, including risks that it may change in the future
> >>> (the asf libs are the safest for us from this perspective);
> >>> - Similarity of library design (to the airline). This means that the
> >>> closer the libraries are, the easier it is to migrate to them, and the
> >>> easier it is to get guarantees that we haven't broken anything. The
> >>> further away the libraries are, the more extra code and testing we
> >>> need;
> >>> - Backward compatibility. The ideal case is where the user doesn't
> >>> even notice that a different library is being used under the hood.
> >>> This includes both the help output and command output.
> >>>
> >>> Of course, all libraries need to be known and well-maintained.
> >>>
> >>> What candidates?
> >>>
> >>>
> >>> Picocli
> >>> https://picocli.info/
> >>>
> >>> This is the well-known cli

review request for pull 1741

2022-08-02 Thread Claude Warren, Jr via dev

Greetings,  Can I get a review of
https://github.com/apache/cassandra/pull/1741 other than the obvious issue
with CHANGES.txt does anyone see anyting that needs to be fixed?

CASSANDRA-14940 and flaky tests

2022-08-04 Thread Claude Warren, Jr via dev

I started looking at the backlog of critical errors in Jira.   It contains
a fully working example of the issue.  While it was reported under version
3.11.3 it appears to be present under 4.0.5.  I don't know the "go"
language but my reading of the script is that, in a single cassandra
configuration, it inserts a value and then immediately updates it.  At the
end of the test the table is read and any record that was not changed is
noted as abnormal.  Most of the time there are no abnormal entries, every
now and then it fails.  Running with 10 inserts it will generate at least
one abnormal result in 8 out of 10 runs.

This test uses the SimpeStrategy replication class with a replication
factor of 1.

I ran the test with  in the logback.xml  I ran the test performing 50 inserts and
updates as 10 did not fail consistently when debugging was enabled.
(Sounds like a timing issue).  There were no significant differences
between the logs of the successful and the abnormal runs.

The ordering of the MutationStage-1 and MutationStage-2 execution was
slightly different.  The good run had 28 calls from MutationStage1 and 22
from Mutation-Stage2, the bad run had 30 and 20 respectively.

It looks like the system reports update success but manages to lose the
update later.

Does anyone have any idea how to approach debugging this?

Claude

Re: key apt by apache cassandra seems deprecated and cqlsh is broken

2022-08-09 Thread Claude Warren, Jr via dev

Could this be related to the deprecation of apt-key on your system?  You
don't specify what version of which distribution you are using.  However,
there is a good example of how to solve the issue at
https://www.linuxuprising.com/2021/01/apt-key-is-deprecated-how-to-add.html



On Tue, Aug 9, 2022 at 11:51 AM Dorian ROSSE  wrote:

> hello,
>
>
> i am getting this error when i try to update my system :
>
> '''W: http://www.apache.org/dist/cassandra/debian/dists/40x/InRelease:
> Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the
> DEPRECATION section in apt-key(8) for details.
> '''
>
> (for this error on my ubuntu system i have reach the ubuntu launchpad
> suport if that hold by their side)
>
> the apache cassandra is installed but the subprogram cqlsh is broken thus
> i have tried to install the module python asked but it seems unexisted :
>
> '''~$ sudo cqlsh
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 20, in 
> import cqlshlib
> ModuleNotFoundError: No module named 'cqlshlib'
> ~$ sudo pip3 install cqlshlib
> ERROR: Could not find a version that satisfies the requirement cqlshlib
> (from versions: none)
> ERROR: No matching distribution found for cqlshlib
> ~$ sudo python3 install cqlshlib
> python3: can't open file '/home/dorianrosse/install': [Errno 2] No such
> file or directory
> ~$ sudo pip install cqlshlib
> ERROR: Could not find a version that satisfies the requirement cqlshlib
> (from versions: none)
> ERROR: No matching distribution found for cqlshlib
> ~$ sudo python install cqlshlib
> python: can't open file 'install': [Errno 2] No such file or directory
> '''
>
> thanks you in advance to help myself repair both errors,
>
> regards.
>
>
> Dorian ROSSE.
>

[DISCUSS] Remove Dead Pull Requests

2022-08-10 Thread Claude Warren, Jr via dev

At the moment we have 222 open pull requests.  Some dating back 4 years.
For some the repository from which they were pulled from has been deleted.
For many there are branch conflicts.

Now, I am new here so please excuse any misstatements and attribute to
ignorance not malice any offence.

I would like to propose the  following:


   1. We accept simple typo corrections without a ticket.
   2. Add a "Propose Close" label
   3. We "Propose Close" any pull request for which the originating
   repository has been deleted.
   4. We "Propose Close" any ticket, other than simple typo corrections,
   that has been labeled missing-ticket for more than 30 days.
   5. We Close any pull request that has been in the "Propose Close" state
   for more than 30 days.

I don't have access to make any of these changes.  If granted access I
would be willing to manage the process.

Claude

Re: Cassandra project status update 2022-08-03

2022-08-10 Thread Claude Warren, Jr via dev

Perhaps flaky tests need to be handled differently.  Is there a way to
build a statistical model of the current flakiness of the test that we can
then use during testing to accept the failures?  So if an acceptable level
of flakiness is developed then if the test fails, it needs to be run again
or multiple times to get a sample and ensure that the failure is not
statistically significant.



On Wed, Aug 10, 2022 at 8:51 AM Benedict Elliott Smith 
wrote:

> 
> > We can start by putting the bar at a lower level and raise the level
> over time
>
> +1
>
> > One simple approach that has been mentioned several time is to run the
> new tests added by a given patch in a loop using one of the CircleCI tasks
>
> I think if we want to do this, it should be extremely easy - by which I
> mean automatic, really. This shouldn’t be too tricky I think? We just need
> to produce a diff of new test classes and methods within existing classes.
> If there doesn’t already exist tooling to do this, I can probably help out
> by putting together something to output @Test annotated methods within a
> source tree, if others are able to turn this into a part of the CircleCI
> pre-commit task (i.e. to pick the common ancestor with trunk, 4.1 etc, and
> run this task for each of the outputs). We might want to start
> standardising branch naming structures to support picking the upstream
> branch.
>
> > We should also probably revert newly committed patch if we detect that
> they introduced flakies.
>
> There should be a strict time limit for reverting a patch for this reason,
> as environments change and what is flaky now was not necessarily before.
>
> On 9 Aug 2022, at 12:57, Benjamin Lerer  wrote:
>
> At this point it is clear that we will probably never be able to remove
> some level of flakiness from our tests. For me the questions are: 1) Where
> do we draw the line for a release ? and 2) How do we maintain that line
> over time?
>
> In my opinion, not all flakies are equals. Some fails every 10 runs, some
> fails 1 in a 1000 runs. I would personally draw the line based on that
> metric. With the circleci tasks that Andres has added we can easily get
> that information for a given test.
> We can start by putting the bar at a lower level and raise the level over
> time when most of the flakies that we hit are above that level.
>
> At the same time we should make sure that we do not introduce new flakies.
> One simple approach that has been mentioned several time is to run the new
> tests added by a given patch in a loop using one of the CircleCI tasks.
> That would allow us to minimize the risk of introducing flaky tests. We
> should also probably revert newly committed patch if we detect that they
> introduced flakies.
>
> What do you think?
>
>
>
>
>
> Le dim. 7 août 2022 à 12:24, Mick Semb Wever  a écrit :
>
>>
>>
>> With that said, I guess we can just revise on a regular basis what
>>> exactly are the last flakes and not numbers which also change quickly up
>>> and down with the first change in the Infra.
>>>
>>
>>
>> +1, I am in favour of taking a pragmatic approach.
>>
>> If flakies are identified and triaged enough that, with correlation from
>> both CI systems, we are confident that no legit bugs are behind them, I'm
>> in favour of going beta.
>>
>> I still remain in favour of somehow incentivising reducing other flakies
>> as well. Flakies that expose poor/limited CI infra, and/or tests that are
>> not as resilient as they could be, are still noise that indirectly reduce
>> our QA (and increase efforts to find and tackle those legit runtime
>> problems). Interested in hearing input from others here that have been
>> spending a lot of time on this front.
>>
>> Could it work if we say: all flakies must be ticketed, and test/infra
>> related flakies do not block a beta release so long as there are fewer than
>> the previous release? The intent here being pragmatic, but keeping us on a
>> "keep the campground cleaner" trajectory…
>>
>>
>

Re: [DISCUSS] Remove Dead Pull Requests

2022-08-11 Thread Claude Warren, Jr via dev

ways re-open or re-create PRs that are auto-closed but are still
> relevant, but I think it looks bad for the project to have a large amount
> of stale unacted PRs.
>
> Em qua., 10 de ago. de 2022 às 11:08, C. Scott Andreas <
> sc...@paradoxica.net> escreveu:
>
> Claude, can you say more about the goal or purpose that closing tickets
> advances?
>
> There are quite a lot of tickets with patches attached that the project
> has either not been able to act on at the time; or which the original
> contributor started but was unable to complete. We’ve picked up many of
> these after a couple years and carried them to completion. Byte-comparable
> types come to mind. There are many, many more.
>
> Closing these tickets would be a very terminal action. If the goal is to
> distinguish what’s active from tickets that have gone quiet, adding a
> “dormant” label might work.
>
> - Scott
>
> On Aug 10, 2022, at 1:00 AM, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
> 
> At the moment we have 222 open pull requests.  Some dating back 4 years.
> For some the repository from which they were pulled from has been deleted.
> For many there are branch conflicts.
>
> Now, I am new here so please excuse any misstatements and attribute to
> ignorance not malice any offence.
>
> I would like to propose the  following:
>
>
>1. We accept simple typo corrections without a ticket.
>2. Add a "Propose Close" label
>3. We "Propose Close" any pull request for which the originating
>repository has been deleted.
>4. We "Propose Close" any ticket, other than simple typo corrections,
>that has been labeled missing-ticket for more than 30 days.
>5. We Close any pull request that has been in the "Propose Close"
>state for more than 30 days.
>
> I don't have access to make any of these changes.  If granted access I
> would be willing to manage the process.
>
> Claude
>
>
>

Re: [DISCUSS] Remove Dead Pull Requests

2022-08-11 Thread Claude Warren, Jr via dev

I agree the amount of work is somewhat overwhelming for the proposed
change, but I was referring to the lack of a Jira ticket blocking the pull
request.  At least that is how it looks to the new observer.  Perhaps we
should add a "trivial change" label for requests that do not have a ticket
and are trivial.

How many branches do the changes currently need to be applied to?  I assume
this goes up by 1 after the next release.

On Thu, Aug 11, 2022 at 9:36 AM Benjamin Lerer  wrote:

> Is there an objection to accepting "typo" corrections without a ticket?
>>
>
> One problem to be aware of is that those pull requests need to be
> converted in patches and merged manually up to trunk if they were done on
> older branches. So it might not look like it at first but it can be quite
> time consuming.
>
> Le jeu. 11 août 2022 à 10:07, Benedict  a écrit :
>
>> Those all seem like good suggestions to me
>>
>> On 11 Aug 2022, at 08:44, Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>>
>> 
>> My original goal was to reduce the number of pull requests in the backlog
>> as it appears, from the outside, that the project does not really care for
>> outside contributions when there are over 200 pull requests pending and
>> many of them multiple years old.  I guess that is an optics issue.  Upon
>> looking at the older backlog, there were a few that I felt could be closed
>> because they didn't have tickets, or were trivial (i.e. typo correction),
>> or for which the original repository no longer exists.  However, from the
>> conversation here, it seems like the older pull requests are being used as
>> a long term storage for ideas that have not come to fruition and for which
>> the original developer may no longer be active.
>>
>> Looking through the pull request backlog there are a number of requests
>> that are not associated with a ticket.  Perhaps we should add pull
>> request template to github to request the associated ticket number when the
>> pull request is made.  The template can also request any other information
>> we this appropriate to speeding acceptance of the request.  I would add a
>> "This is a trivial change" checkbox for things like typo changes.  Is
>> there any documentation on the pull request process?  I think I saw
>> something that said patches were requested, but I can't find it now.  We
>> could add a link to any such documentation in the template as well.
>>
>> Is there an objection to accepting "typo" corrections without a ticket?
>>
>>
>>
>> Claude
>>
>> On Wed, Aug 10, 2022 at 5:08 PM Josh McKenzie 
>> wrote:
>>
>>> I think of this from a discoverability and workflow perspective at least
>>> on the JIRA side, though many of the same traits apply to PR's. Some
>>> questions that come to mind:
>>>
>>> 1. Are people grepping through the backlog of open items for things to
>>> work on they'd otherwise miss if they were closed out?
>>> 2. Are people grepping via specific text phrases in the summary, with or
>>> without "resolution = unresolved",  to try and find things on a particular
>>> topic to work on?
>>> 3. Relying on labels? Components? Something else?
>>>
>>> My .02: folks that are new to the project probably need more guidance on
>>> what to look for to get engaged with which is served by the LHF +
>>> unresolved + status emails + @cassandra_mentors. Mid to long-timers are
>>> probably more likely to search for specific topics, but may search for open
>>> tickets with patches attached or Patch Available things (seems unlikely as
>>> most of us have areas we're focused on but is possible?)
>>>
>>> The status quo today (leave things open if work has been done on it
>>> and/or it's an idea that clearly still has some relevance) seems to satisfy
>>> the most use-cases and retain the most flexibility, so I'd advocate for us
>>> not making a change just to make a change. While we could add a tag or
>>> resolution that indicates something closed out due to it being stale, my
>>> intuition is that people will just tack on "resolution = unresolved OR
>>> labels = closed_stale" in the JIRA case or sift through all things not
>>> merged in the PR case to effectively end up with the same body of results
>>> they're getting today.
>>>
>>> Given the ability of JQL to sort and slice based on updated times as
>>> well, it's relatively trivial to exclude stale

[Proposal] add pull request template

2022-08-15 Thread Claude Warren, Jr via dev

Github provides the ability to add a pull request template [1].  I think
that such a template could assist in making the pull requests better.
Something like the text below, along with verifying that CASSANDRA-### will
link to Jira [2], should provide the information needed and remind
submitters of what is desired.

If there is agreement here, I'll open a pull request to add the
documentation and ask Apache devops to verify that the CASSANDRA-### will
link to Jira.

Claude

[1]
https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository
for more information.
 [2]
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/managing-repository-settings/configuring-autolinks-to-reference-external-resources

- start of text

Issue resolved #  CASSANDRA-

Pull request Description:





- [ ] Commits have been squashed to remove intermediate development
commit messages.
 - [ ] Key commit messages start with the issue number (CASSANDRA-)


either - [ ] this is a trivial documentation change. (e.g. fixes a typo)

or:
 - [ ] Tests are included.
 - [ ] Documentation changes and/or updates are included.


By submitting this pull request, I acknowledge that I am making a
contribution to the Apache Software Foundation under the terms and
conditions of the [Contributor's
Agreement](https://www.apache.org/licenses/contributor-agreements.html).



See the [Apache Cassandra "Contributing to Cassandra"
guide](https://cassandra.apache.org/_/development/index.html) and/or
the [Apache Cassandra "Working on Documentation"
guide](https://cassandra.apache.org/_/development/documentation.html)

Re: [Proposal] add pull request template

2022-08-15 Thread Claude Warren, Jr via dev

If there is consensus that the PR template is a good idea, I'll create a
branch and we can wrangle the words.

I think the pull request generally is against a specific branch but having
the branch listed in the template is not a bad idea.  I do want to avoid
having too many questions.  We could change the text "Pull request
Description:" to be something more descriptive like "Describe what the pull
request fixes, why it is needed, and what branch(s) you expect it to be
applied to."

Or perhaps the branch question should be separate. Opinions?



On Mon, Aug 15, 2022 at 10:00 AM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> I like auto linking, definitely a handy feature.
>
> I am not sure about the content of the pull request description. I
> would include what that PR is actually for / why it is necessary to
> merge it and into what branches a contributor expects that PR to be
> merged in. However, this might be omitted if all this information is
> in a JIRA ticket already, I find the correct auto linking to be the
> most crucial here.
>
> There might be a bullet point for adding relevant CI builds (Jenkins or
> Circle).
>
> I am not sure we are going to enforce a commit message to start with
> the issue number. The issue number is already mentioned in the commit
> message. I feel like this kind of stuff is not crucial for a PR to be
> opened, a committer who is actually going to merge it will take extra
> time and care when it comes to these formalities anyway. The reason
> why a PR should be merged should be the priority.
>
> On Mon, 15 Aug 2022 at 10:41, Claude Warren, Jr via dev
>  wrote:
> >
> > Github provides the ability to add a pull request template [1].  I think
> that such a template could assist in making the pull requests better.
> Something like the text below, along with verifying that CASSANDRA-### will
> link to Jira [2], should provide the information needed and remind
> submitters of what is desired.
> >
> > If there is agreement here, I'll open a pull request to add the
> documentation and ask Apache devops to verify that the CASSANDRA-### will
> link to Jira.
> >
> > Claude
> >
> > [1]
> https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository
> for more information.
> >  [2]
> https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/managing-repository-settings/configuring-autolinks-to-reference-external-resources
> >
> > - start of text
> >
> > Issue resolved #  CASSANDRA-
> >
> > Pull request Description:
> >
> >
> >
> > 
> >
> > - [ ] Commits have been squashed to remove intermediate development
> commit messages.
> >  - [ ] Key commit messages start with the issue number (CASSANDRA-)
> >
> >
> > either
> >  - [ ] this is a trivial documentation change. (e.g. fixes a typo)
> >
> > or:
> >  - [ ] Tests are included.
> >  - [ ] Documentation changes and/or updates are included.
> >
> >
> > By submitting this pull request, I acknowledge that I am making a
> contribution to the Apache Software Foundation under the terms and
> conditions of the [Contributor's Agreement](
> https://www.apache.org/licenses/contributor-agreements.html).
> >
> > 
> >
> > See the [Apache Cassandra "Contributing to Cassandra" guide](
> https://cassandra.apache.org/_/development/index.html) and/or the [Apache
> Cassandra "Working on Documentation" guide](
> https://cassandra.apache.org/_/development/documentation.html)
>

Re: [Proposal] add pull request template

2022-08-16 Thread Claude Warren, Jr via dev

I am all for simplification.  How about

- start of text 

Issue resolved:  CASSANDRA-



 - [ ] Jira ticket contains a description of: what is fixed, why it is
needed, and what branches to apply it to.

 - [ ] Commits have been squashed to remove intermediate development
commit messages.

 - [ ] Key commit messages start with the issue number (CASSANDRA-)


either - [ ] this is a trivial documentation change. (e.g. fixes a typo)

or:
 - [ ] Tests are included.
 - [ ] Documentation changes and/or updates are included.


By submitting this pull request, I acknowledge that I am making a
contribution to the Apache Software Foundation under the terms and
conditions of the [Contributor's
Agreement](https://www.apache.org/licenses/contributor-agreements.html).


See the [Apache Cassandra "Contributing to Cassandra"
guide](https://cassandra.apache.org/_/development/index.html) and/or
the [Apache Cassandra "Working on Documentation"
guide](https://cassandra.apache.org/_/development/documentation.html)



 end of text 

On Tue, Aug 16, 2022 at 8:42 AM Erick Ramirez 
wrote:

> +1 this is a great idea. But personally, I'm not too fussed about the
> level of detail that is in the template -- what is important is that
> contributors are reminded that there needs to be a ticket associated with
> contributions. Without being too prescriptive, aspiring contributors should
> really familiarise themselves with how to contribute[1] so they would know
> to search existing tickets first to avoid things like duplication.
>
>  Additionally, I personally prefer details about a contribution to be
> documented in a ticket rather than a PR because information stored in
> tickets are more persistent. Having said that, it doesn't hurt to have the
> details included in the PR as long as it is in the ticket too. Cheers!
>
>>

Re: [Proposal] add pull request template

2022-08-18 Thread Claude Warren, Jr via dev

Since there seems to be agreement, I opened a ticket (CASSANDRA-17837) and
a pull request (https://github.com/apache/cassandra/pull/1799) in so that
the final text can be hashed out and accepted.

I also used the proposed pull request in the text of the pull so that it
can be seen in all its glory 😇

On Thu, Aug 18, 2022 at 9:10 PM Josh McKenzie  wrote:

> I have never seen this
> kind of git merging strategy elsewhere, I am not sure if I am not
> experienced enough or we are truly unique the way we do things.
>
> I am very fond of this project and this community. THAT SAID ;) you could
> replace "kind of git merging strategy" with a lot of different things and
> have it equally apply on this project.
>
> Perils of being a mature long-lived project I suspect. I'm all for us
> doing the hard work of introspecting on how we do things and changing them
> to improve or match industry standards where applicable.
>
> On Thu, Aug 18, 2022, at 3:33 PM, Stefan Miklosovic wrote:
>
> Interesting, thanks for explicitly writing that down. I humbly think
> the CI and the convenience of the GitHub workflow is ultimately
> secondary when it comes to the code-base as such. Indeed, nice to
> have, but if it turns out to be uncomfortable in other ways, I guess
> we just have to live with what we have. TBH I have never seen this
> kind of git merging strategy elsewhere, I am not sure if I am not
> experienced enough or we are truly unique the way we do things.
> However, it does make sense.
>
> On Thu, 18 Aug 2022 at 21:28, Benedict 
> wrote:
> >
> > The benefits being extolled involve people setting up GitHub bots to
> integrate with PRs to run CI etc, which will require some non-trivial
> investment by somebody to put together
> >
> > The alternative merge strategy being discussed is not to merge, but to
> instead cherry-pick or rebase. This means we can produce separate PRs for
> each branch, that can be merged independently via the GitHub API. The
> downside of this is that there are no merge commits, while one upside of
> this is that there are no merge commits.
> >
> > On 18 Aug 2022, at 20:20, Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
> >
> > No chicken-egg to me. All it takes is ctrl+c & ctrl+v on your merging
> > commits. How would new merging strategy actually look like? I am all
> > ears. This seems to be quite nice as is if we stick to be more verbose
> > what we did.
> >
> > On Thu, 18 Aug 2022 at 20:27, Benedict  wrote:
> >
> >
> > Was it?
> >
> >
> > I mean, we’ve all (or most) I think worked on projects with those
> things, so we all know what the benefits are?
> >
> >
> > It’s fair to point out that we don’t have it even running for any branch
> yet. However there’s perhaps a chicken-and-egg situation, where I’m unsure
> the investment to develop can be justified by those who are able, if
> there’s a chance it will be discarded? I can’t see us maintaining a
> bifurcated process, where some patches go through automation and others
> don’t, so if we don’t change the merge strategy that work would presumably
> end up wasted.
> >
> >
> > On 18 Aug 2022, at 18:53, Mick Semb Wever  wrote:
> >
> >
> > 
> >
> >
> > That debatable benefit aside, not doing merge commits would also open up
> options for us to use PR's for merges and integrate running CI, and
> blocking on clean CI, pre-merge. Which has some other pretty big benefits.
> :)
> >
> >
> >
> >
> > The past agreement IIRC was to start doing those things on trunk-only so
> we can evaluate them for real.
>
>
>

Is this an MV bug?

2022-08-19 Thread Claude Warren, Jr via dev

# Table definitions

Table [ Primary key ] other data
base  [ A B C ] D E
MV[ D C ] A B E


# Initial  data
base   -> MV
[ a b c ] d e  -> [d c] a b e
[ a' b c ] d e -> [d c] a' b e


## Mutations -> expected outcome

M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
M2: base [ a b c ] d' e -> MV [ d' c ] a b e

## processing bug
Assume lock can not be obtained during processing of M1.

The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )

Assume M2 obtains the lock and executes.

MV is now
[ d' c ] a b e

M1 then obtains the lock and executes

MV is now
[ d c ] a b e'
[ d' c] a b e

base is
[ a b c ] d e'

MV entry "[ d' c ] a b e" is orphaned

Re: Is this an MV bug?

2022-08-19 Thread Claude Warren, Jr via dev

If each mutation comes from a separate CQL they would be separate, no?


On Fri, Aug 19, 2022 at 10:17 AM Benedict  wrote:

> If M1 and M2 both operate over the same partition key they won’t be
> separate mutations, they should be combined into a single mutation before
> submission to SP.mutate
>
> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
> >
> > 
> >
> > # Table definitions
> >
> > Table [ Primary key ] other data
> > base  [ A B C ] D E
> > MV[ D C ] A B E
> >
> >
> > # Initial  data
> > base   -> MV
> > [ a b c ] d e  -> [d c] a b e
> > [ a' b c ] d e -> [d c] a' b e
> >
> >
> > ## Mutations -> expected outcome
> >
> > M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e
> >
> > ## processing bug
> > Assume lock can not be obtained during processing of M1.
> >
> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )
> >
> > Assume M2 obtains the lock and executes.
> >
> > MV is now
> > [ d' c ] a b e
> >
> > M1 then obtains the lock and executes
> >
> > MV is now
> > [ d c ] a b e'
> > [ d' c] a b e
> >
> > base is
> > [ a b c ] d e'
> >
> > MV entry "[ d' c ] a b e" is orphaned
> >
> >
>
>

Re: Is this an MV bug?

2022-08-19 Thread Claude Warren, Jr via dev

Perhaps my diagram was not clear.  I am starting with mutations on the base
table.  I assume they are not bundled together so from separate CQL
statements.

On Fri, Aug 19, 2022 at 11:11 AM Claude Warren, Jr 
wrote:

> If each mutation comes from a separate CQL they would be separate, no?
>
>
> On Fri, Aug 19, 2022 at 10:17 AM Benedict  wrote:
>
>> If M1 and M2 both operate over the same partition key they won’t be
>> separate mutations, they should be combined into a single mutation before
>> submission to SP.mutate
>>
>> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev <
>> dev@cassandra.apache.org> wrote:
>> >
>> > 
>> >
>> > # Table definitions
>> >
>> > Table [ Primary key ] other data
>> > base  [ A B C ] D E
>> > MV[ D C ] A B E
>> >
>> >
>> > # Initial  data
>> > base   -> MV
>> > [ a b c ] d e  -> [d c] a b e
>> > [ a' b c ] d e -> [d c] a' b e
>> >
>> >
>> > ## Mutations -> expected outcome
>> >
>> > M1: base [ a b c ] d e'  -> MV [ d c ] a b e'
>> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e
>> >
>> > ## processing bug
>> > Assume lock can not be obtained during processing of M1.
>> >
>> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 )
>> >
>> > Assume M2 obtains the lock and executes.
>> >
>> > MV is now
>> > [ d' c ] a b e
>> >
>> > M1 then obtains the lock and executes
>> >
>> > MV is now
>> > [ d c ] a b e'
>> > [ d' c] a b e
>> >
>> > base is
>> > [ a b c ] d e'
>> >
>> > MV entry "[ d' c ] a b e" is orphaned
>> >
>> >
>>
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-21 Thread Claude Warren, Jr via dev

I am more interested in the motivation where it is stated:

Many users have the need of masking sensitive data, such as contact info,
> age, gender, credit card numbers, etc. Dynamic data masking (DDM) allows to
> obscure sensitive information while still allowing access to the masked
> columns, and without changing the stored data.


There is an unspoken assumption that the stored data format can not be
changed.  It feels like this solution is starting from a false premise.
Throughout the document there are guard statements about how this does not
replace encryption.  Isn't there an assumption here that encryption can not
be used?  Would we not be better served to build in an encryption strategy
that keeps the data encrypted until the user shows permissions to decrypt,
like the unmask property?  An encryption strategy that can work within the
Cassandra internals?

I think that issue is that there are some data fields that should not be
discoverable by unauthorized users/systems, and I think this solution masks
that issue.  I fear that this capability will be seized upon by pointy
haired managers as a cheaper alternative to encryption, regardless of the
warnings otherwise, and that as a whole will harm the Cassandra ecosystem.

Yes, encryption is more difficult to implement and will take longer, but
this feels like a sticking plaster that distracts from that underlying
issue.

my 0.02

On Mon, Aug 22, 2022 at 12:30 AM Andrés de la Peña 
wrote:

> > If the column names are the same for masked and unmasked data, it would
>> impact existing applications. I am curious what the transition plan look
>> like for applications that expect unmasked data?
>
> For example, let’s say you store SSNs and Birth dates. Upon enabling this
>> feature, let’s say the app user is not given the UNMASK permission. Now the
>> app is receiving masked values for these columns. This is fine for most
>> read only applications. However, a lot of times these columns may be used
>> as primary keys or part of primary keys in other tables. This would break
>> existing applications.
>> How would this work in mixed mode when  ew nodes in the cluster are
>> masking data and others aren’t? How would it impact the driver?
>> How would the application learn that the column values are masked? This
>> is important in case a user has UNMASK permission and then later taken
>> away. Again this would break a lot of applications.
>
>
> Changing the masking of a column is a schema change, and as such it can be
> risky for existing applications. However, differently to deleting a column
> or revoking a SELECT permission, suddenly activating masking might pass
> undetected for existing applications.
>
> Applications developed after the introduction of this feature can check
> the table schema to know if a column is masked or not. We can even add a
> specific system view to ease this, if we think it's worth it. However,
> administrators should not activate masking when there could be applications
> that are not aware of the feature. We should be clear about this in the
> documentation.
>
> This is the way data masking seems to work in the databases I've checked.
> I also though that we could just change the name of the column when it's
> masked to something as "masked(column_name)", as it is discussed in the CEP
> document. This would make it impossible to miss that a column is masked.
> However, applications should be prepared to use different column names when
> reading result sets, depending on whether the data is masked for them or
> not. None of the databases mentioned on the "other databases" section of
> the CEP does this kind of column renaming, so it might be a kind of exotic
> behaviour. wdyt?
>
> On Fri, 19 Aug 2022 at 19:17, Andrés de la Peña 
> wrote:
>
>> > This type of feature is very useful, but it may be easier to analyze
>>> this proposal if it’s compared with other DDM implementations from other
>>> databases? Would it be reasonable to add a table to the proposal comparing
>>> syntax and output from eg Azure SQL vs Cassandra vs whatever ?
>>
>>
>> Good idea. I have added a section at the end of the document briefly
>> describing how some other databases deal with data masking, and with links
>> to their documentation for the topic. I am not an expert in none of those
>> databases, so please take my comments there with a grain of salt.
>>
>> On Fri, 19 Aug 2022 at 17:30, Jeff Jirsa  wrote:
>>
>>> This type of feature is very useful, but it may be easier to analyze
>>> this proposal if it’s compared with other DDM implementations from other
>>> databases? Would it be reasonable to add a table to the proposal comparing
>>> syntax and output from eg Azure SQL vs Cassandra vs whatever ?
>>>
>>>
>>> On Aug 19, 2022, at 4:50 AM, Andrés de la Peña 
>>> wrote:
>>>
>>> 
>>> Hi everyone,
>>>
>>> I'd like to start a discussion about this proposal for dynamic data
>>> masking:
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynam

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-24 Thread Claude Warren, Jr via dev

This seems to me to be a client display filter, applied at the last moment
as data are streaming back to the client.  It has no impact on any keys,
queries or secondary internal index or materialized view.  It simply
prevents the display from showing the complete value.  It does not preclude
determining what some values are by building carefully crafted queries.

On Wed, Aug 24, 2022 at 8:40 AM Benedict  wrote:

> Is it typical for a masking feature to make no effort to prevent
> unmasking? I’m just struggling to see the value of this without such
> mechanisms. Otherwise it’s just a default formatter, and we should consider
> renaming the feature IMO
>
> On 23 Aug 2022, at 21:27, Andrés de la Peña  wrote:
>
> 
> As mentioned in the CEP document, dynamic data masking doesn't try to
> prevent malicious users with SELECT permissions to indirectly guess the
> real value of the masked value. This can easily be done by just trying
> values on the WHERE clause of SELECT queries. DDM would not be a
> replacement for proper column-level permissions.
>
> The data served by the database is usually consumed by applications that
> present this data to end users. These end users are not necessarily the
> users directly connecting to the database. With DDM, it would be easy for
> applications to mask sensitive data that is going to be consumed by the end
> users. However, the users directly connecting to the database should be
> trusted, provided that they have the right SELECT permissions.
>
> In other words, DDM doesn't directly protect the data, but it eases the
> production of protected data.
>
> Said that, we could later go one step ahead and add a way to prevent
> untrusted users from inferring the masked data. That could be done adding a
> new permission required to use certain columns on WHERE clauses, different
> to the current SELECT permission. That would play especially well with
> column-level permissions, which is something that we still have pending.
>
> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz  wrote:
>
>> Applying this should prevent querying on a field, else you could leak its
>>> contents, surely?
>>>
>>
>> In theory, yes.  Although I could see folks doing something like this:
>>
>> SELECT COUNT(*) FROM patients
>> WHERE year_of_birth = 2002
>> AND date_of_birth >= '2002-04-01'
>> AND date_of_birth < '2002-11-01';
>>
>> In this case, the rows containing the masked key column(s) could be
>> filtered on without revealing the actual data.  But again, that's probably
>> better for a "phase 2" of the implementation.
>>
>> Agreed on not being a queryable field. That would also preclude secondary
>>> indexing, right?
>>
>>
>> Yes, that's my thought as well.
>>
>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker 
>> wrote:
>>
>>> Agreed on not being a queryable field. That would also preclude
>>> secondary indexing, right?
>>>
>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict  wrote:
>>>
 Applying this should prevent querying on a field, else you could leak
 its contents, surely? This pretty much prohibits using it in a clustering
 key, and a partition key with the ordered partitioner - but probably also a
 hashed partitioner since we do not use a cryptographic hash and the hash
 function is well defined.

 We probably also need to ensure that any ALLOW FILTERING queries on
 such a field are disabled.

 Plausibly the data could be cryptographically jumbled before using it
 in a primary key component (or permitting filtering), but it is probably
 easier and safer to exclude for now…

 On 23 Aug 2022, at 18:13, Aaron Ploetz  wrote:

 Some thoughts on this one:

 In a prior job, we'd give app teams access to a single keyspace, and
 two roles: a read-write role and a read-only role.  In some cases, a
 "privileged" application role was also requested.  Depending on the
 requirements, I could see the UNMASK permission being applied to the RW or
 privileged roles.  But if there's a problem on the table and the operators
 go in to investigate, they will likely use a SUPERUSER account, and they'll
 see that data.

 How hard would it be for SUPERUSERs to *not* automatically get the
 UNMASK permission?

 I'll also echo the concerns around masking primary key components.
 It's highly likely that certain personal data properties would be used as a
 partition or clustering key (ex: range query for people born within a
 certain timeframe).  In addition to the "breaks existing" concern, I'm
 curious about the challenges around getting that to work with the current
 primary key implementation.

 Does this first implementation only apply to payload (non-key)
 columns?  The examples in the CEP currently do not show primary key
 components being masked.

 Thanks,

 Aaron

 On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo 
 wrote:

>

1 2 >

1 - 100 of 121 matches

Mail list logo