[VOTE] Release Apache Cassandra 4.1.1

2023-03-16 Thread Miklosovic, Stefan
Proposing the test build of Cassandra 4.1.1 for release.

sha1: 8d91b469afd3fcafef7ef85c10c8acc11703ba2d
Git: 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.1.1-tentative
Maven Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-1284/org/apache/cassandra/cassandra-all/4.1.1/

The Source and Build Artifacts, and the Debian and RPM packages and 
repositories, are available here: 
https://dist.apache.org/repos/dist/dev/cassandra/4.1.1/

The vote will be open for 72 hours (longer if needed). Everyone who has tested 
the build is invited to vote. Votes by PMC members are considered binding. A 
vote passes if there are at least three binding +1s and no -1's.

[1]: CHANGES.txt: 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.1.1-tentative
[2]: NEWS.txt: 
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.1.1-tentative


[DISCUSS] Change the useage of nodetool tablehistograms

2023-03-16 Thread guo Maxwell
Hello everyone :
The nodetool tablehistograms have one argument which you can fill with only
one table name with the format "keyspace_name.table_name /keyspace_name
table_name", so that you can get the table histograms of the specied table.

And  if none arguments is set, all the tables' histograms will be print
out.And if more than 2 arguments (nomatter the format is right or wrong) are
set , all the tables' histograms will also be print out too(Which is a bug
In my mind).

So the usage of nodetool tablehistograms has some usage restrictions, That
is either output one , or all informations.

As CASSANDRA-18296
 described
, I will change the usage of nodetool tablehistograms, which support the
feature below:
1. nodetool tablehistograms ks.tb1 ks.tb2  //print out list of tables'
histograms with format keyspace.table
2.nodetool tablehistograms ks1 ks2 ks3 ... //print out list of keyspaces
histograms
3.nodetool tablehistograms -i ks1 ks2  //print out list of table
histograms except for the keyspaces list behind the option -i
4.nodetool tablehistograns -i ks ks.tb // print out list tables'
histograms except
for table in keyspace ks and ks.tb table.
5.none option specified ,then all tables histograms will be print out.

The usage will breaks compatibility with how it was done previously, and as
this is a user facing tool.

So, What do you think?

Thanks~~~


Re: Should we cut some new releases?

2023-03-16 Thread Benjamin Lerer
>
> Not sure about 4.0.9. We released 4.0.8 just few weeks ago. I would do
> 4.1.1 first.


It is true that there have not been many fixes since 4.0.8. Nevertheless
CASSANDRA-18125  is
a significant issue and 4.0 is certainly more used than 4.1 at this point
so I believe that we should also release a 4.0.9 version.

Le mar. 14 mars 2023 à 17:27, Josh McKenzie  a écrit :

> +1
>
> On Tue, Mar 14, 2023, at 7:50 AM, Aleksey Yeshchenko wrote:
>
> +1
>
> On 14 Mar 2023, at 05:50, Berenguer Blasi 
> wrote:
>
> +1
> On 13/3/23 21:25, Jacek Lewandowski wrote:
>
> +1
>
> pon., 13 mar 2023, 20:36 użytkownik Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> napisał:
>
> Yes, I was waiting for CASSANDRA-18125 to be in.
>
> I can release 4.1.1 to staging tomorrow morning CET if nobody objects that.
>
> Not sure about 4.0.9. We released 4.0.8 just few weeks ago. I would do
> 4.1.1 first.
>
> 
> From: Ekaterina Dimitrova 
> Sent: Monday, March 13, 2023 18:12
> To: dev@cassandra.apache.org
> Subject: Re: Should we cut some new releases?
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
> +1
>
> On Mon, 13 Mar 2023 at 12:23, Benjamin Lerer  ble...@apache.org>> wrote:
> Hi everybody,
>
> Benedict and Jon recently committed the patch for CASSANDRA-18125<
> https://issues.apache.org/jira/browse/CASSANDRA-18125> which fixes some
> serious problems at the memtable/flush level. Should we consider cutting
> some releases that contain this fix?
>
>


Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-16 Thread Bowen Song via dev

The documented command options are:

   nodetool tablehistograms [  | ]


That means one parameter will be treated as dot separated keyspace and 
table. Alternatively, two parameters will be treated as the keyspace and 
table respectively.


To remain compatible with the documented behaviour, my suggestion is to 
change the command options to:


   nodetool tablehistograms [  [ [...]] |
[ [...]]]

Feel free to add the "all except ..." feature to the above.

This doesn't break backward compatibility in documented ways. It only 
changes the undocumented behaviour. If someone is using the undocumented 
behaviour, they must know things may break when the software is 
upgraded. We can just add a line to the NEWS.txt and let them update 
their scripts.



On 16/03/2023 08:53, guo Maxwell wrote:

Hello everyone :
The nodetool tablehistograms have one argument which you can fill with 
only one table name with the format "keyspace_name.table_name 
/keyspace_name table_name", so that you can get the table histograms 
of the specied table.


And  if none arguments is set, all the tables' histograms will be 
print out.And if more than 2 arguments (nomatter the format is right 
or wrong) are set , all the tables' histograms will also be print out 
too(Which is a bug In my mind).


So the usage of nodetool tablehistograms has some usage restrictions, 
That is either output one , or all informations.


As CASSANDRA-18296 
 described , I 
will change the usage of nodetool tablehistograms, which support the 
feature below:
1. nodetool tablehistograms ks.tb1 ks.tb2  //print out list of 
tables' histograms with format keyspace.table
2.nodetool tablehistograms ks1 ks2 ks3 ... //print out list of 
keyspaces histograms
3.nodetool tablehistograms -i ks1 ks2  //print out list of table 
histograms except for the keyspaces list behind the option -i
4.nodetool tablehistograns -i ks ks.tb // print out list tables' 
histograms except for table in keyspace ks and ks.tb table.

5.none option specified ,then all tables histograms will be print out.

The usage will breaks compatibility with how it was done previously, 
and as this is a user facing tool.


So, What do you think?

Thanks~~~


Re: Should we cut some new releases?

2023-03-16 Thread Miklosovic, Stefan
OK. We can continue with 4.0.9, then 3.11.15, then 3.0.29 - in this order.


From: Benjamin Lerer 
Sent: Thursday, March 16, 2023 11:43
To: dev@cassandra.apache.org
Subject: Re: Should we cut some new releases?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Not sure about 4.0.9. We released 4.0.8 just few weeks ago. I would do 4.1.1 
first.

It is true that there have not been many fixes since 4.0.8. Nevertheless 
CASSANDRA-18125 is a 
significant issue and 4.0 is certainly more used than 4.1 at this point so I 
believe that we should also release a 4.0.9 version.

Le mar. 14 mars 2023 à 17:27, Josh McKenzie 
mailto:jmcken...@apache.org>> a écrit :
+1

On Tue, Mar 14, 2023, at 7:50 AM, Aleksey Yeshchenko wrote:
+1

On 14 Mar 2023, at 05:50, Berenguer Blasi 
mailto:berenguerbl...@gmail.com>> wrote:


+1

On 13/3/23 21:25, Jacek Lewandowski wrote:
+1

pon., 13 mar 2023, 20:36 użytkownik Miklosovic, Stefan 
mailto:stefan.mikloso...@netapp.com>> napisał:
Yes, I was waiting for CASSANDRA-18125 to be in.

I can release 4.1.1 to staging tomorrow morning CET if nobody objects that.

Not sure about 4.0.9. We released 4.0.8 just few weeks ago. I would do 4.1.1 
first.


From: Ekaterina Dimitrova mailto:e.dimitr...@gmail.com>>
Sent: Monday, March 13, 2023 18:12
To: dev@cassandra.apache.org
Subject: Re: Should we cut some new releases?

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



+1

On Mon, 13 Mar 2023 at 12:23, Benjamin Lerer 
mailto:ble...@apache.org>>>
 wrote:
Hi everybody,

Benedict and Jon recently committed the patch for 
CASSANDRA-18125 which 
fixes some serious problems at the memtable/flush level. Should we consider 
cutting some releases that contain this fix?


Re: [EXTERNAL] Re: [DISCUSS] Next release date

2023-03-16 Thread Mike Adamson
Sorry, I realised that I hadn't included any completion date for CEP-7. At
the current time we are looking at completion mid  to end of April.

Mike

On Mon, 13 Mar 2023 at 11:34, Mike Adamson  wrote:

> CEP-7 Storage Attached Index is in review with ~430 files and ~70k LOC.
> The bulk of the project is in 3 main patches. The first patch (in-memory
> index and query path) is merged to the feature branch CASSANDRA-16052 and
> the second patch (on-disk write and literal / string index) is in review.
>
> Mike
>
> On Thu, 9 Mar 2023 at 09:13, Branimir Lambov  wrote:
>
>> CEPs 25 (trie-indexed sstables) and 26 (unified compaction strategy)
>> should both be ready for review by mid-April.
>>
>> Both are around 10k LOC, fairly isolated, and in need of a committer to
>> review.
>>
>> Regards,
>> Branimir
>>
>> On Mon, Mar 6, 2023 at 11:25 AM Benjamin Lerer  wrote:
>>
>>> Sorry, I realized that when I started the discussion I probably did not
>>> frame it enough as I see that it is now going into different directions.
>>> The concerns I am seeing are:
>>> 1) A too small amount of time between releases  is inefficient from a
>>> development perspective and from a user perspective. From a development
>>> point of view because we are missing time to deliver some features. From a
>>> user perspective because they cannot follow with the upgrade.
>>> 2) Some features are so anticipated (Accord being the one mentioned)
>>> that people would prefer to delay the release to make sure that it is
>>> available as soon as possible.
>>> 3) We do not know how long we need to go from the freeze to GA. We hope
>>> for 2 months but our last experience was 6 months. So delaying the release
>>> could mean not releasing this year.
>>> 4) For people doing marketing it is really hard to promote a product
>>> when you do not know when the release will come and what features might be
>>> there.
>>>
>>> All those concerns are probably even made worse by the fact that we do
>>> not have a clear visibility on where we are.
>>>
>>> Should we clarify that part first by getting an idea of the status of
>>> the different CEPs and other big pieces of work? From there we could agree
>>> on some timeline for the freeze. We could then discuss how to make
>>> predictable the time from freeze to GA.
>>>
>>>
>>>
>>> Le sam. 4 mars 2023 à 18:14, Josh McKenzie  a
>>> écrit :
>>>
 (for convenience sake, I'm referring to both Major and Minor semver
 releases as "major" in this email)

 The big feature from our perspective for 5.0 is ACCORD (CEP-15) and I
 would advocate to delay until this has sufficient quality to be in
 production.

 This approach can be pretty unpredictable in this domain; often
 unforeseen things come up in implementation that can give you a long tail
 on something being production ready. For the record - I don't intend to
 single Accord out *at all* on this front, quite the opposite given how
 much rigor's gone into the design and implementation. I'm just thinking
 from my personal experience: everything I've worked on, overseen, or
 followed closely on this codebase always has a few tricks up its sleeve
 along the way to having edge-cases stabilized.

 Much like on some other recent topics, I think there's a nuanced middle
 ground where we take things on a case-by-case basis. Some factors that have
 come up in this thread that resonated with me:

 For a given potential release date 'X':
 1. How long has it been since the last release?
 2. How long do we expect qualification to take from a "freeze" (i.e. no
 new improvement or features, branch) point?
 3. What body of merged production ready work is available?
 4. What body of new work do we have high confidence will be ready
 within Y time?

 I think it's worth defining a loose "minimum bound and upper bound" on
 release cycles we want to try and stick with barring extenuating
 circumstances. For instance: try not to release sooner than maybe 10 months
 out from a prior major, and try not to release later than 18 months out
 from a prior major. Make exceptions if truly exceptional things land, are
 about to land, or bugs are discovered around those boundaries.

 Applying the above framework to what we have in flight, our last
 release date, expectations on CI, etc - targeting an early fall freeze
 (pending CEP status) and mid to late fall or December release "feels right"
 to me.

 With the exception, of course, that if something merges earlier, is
 stable, and we feel is valuable enough to cut a major based on that, we do
 it.

 ~Josh

 On Fri, Mar 3, 2023, at 7:37 PM, German Eichberger via dev wrote:

 Hi,

 We shouldn't release just for releases sake. Are there enough new
 features and are they working well enough (quality!).

 The big feature from our perspective for 5

Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-16 Thread Josh McKenzie
We could also consider augmenting the tool with new named arguments with the 
functionality you described and leave the positional usage intact.

On Thu, Mar 16, 2023, at 6:43 AM, Bowen Song via dev wrote:
> The documented command options are:
> 
>> nodetool tablehistograms [  | ]
>> 
> 
> 
> That means one parameter will be treated as dot separated keyspace and table. 
> Alternatively, two parameters will be treated as the keyspace and table 
> respectively.
> 
> To remain compatible with the documented behaviour, my suggestion is to 
> change the command options to:
> 
>> nodetool tablehistograms [  [ [...]] | 
>>  [ [...]]]
>> 
> Feel free to add the "all except ..." feature to the above.
> 
> This doesn't break backward compatibility in documented ways. It only changes 
> the undocumented behaviour. If someone is using the undocumented behaviour, 
> they must know things may break when the software is upgraded. We can just 
> add a line to the NEWS.txt and let them update their scripts.
> 
> 
> On 16/03/2023 08:53, guo Maxwell wrote:
>> Hello everyone :
>> The nodetool tablehistograms have one argument which you can fill with only 
>> one table name with the format "keyspace_name.table_name /keyspace_name 
>> table_name", so that you can get the table histograms of the specied table.
>> 
>> And  if none arguments is set, all the tables' histograms will be print 
>> out.And if more than 2 arguments (nomatter the format is right or wrong) are 
>> set , all the tables' histograms will also be print out too(Which is a bug 
>> In my mind).
>> 
>> So the usage of nodetool tablehistograms has some usage restrictions, That 
>> is either output one , or all informations.
>> 
>> As CASSANDRA-18296  
>> described , I will change the usage of nodetool tablehistograms, which 
>> support the feature below:
>> 1. nodetool tablehistograms ks.tb1 ks.tb2  //print out list of tables' 
>> histograms with format keyspace.table
>> 2.nodetool tablehistograms ks1 ks2 ks3 ... //print out list of keyspaces 
>> histograms
>> 3.nodetool tablehistograms -i ks1 ks2  //print out list of table 
>> histograms except for the keyspaces list behind the option -i
>> 4.nodetool tablehistograns -i ks ks.tb // print out list tables' histograms 
>> except for table in keyspace ks and ks.tb table.
>> 5.none option specified ,then all tables histograms will be print out.
>> 
>> The usage will breaks compatibility with how it was done previously, and as 
>> this is a user facing tool.
>> 
>> So, What do you think? 
>> 
>> Thanks~~~
>> 


Re: [DISCUSS] New dependencies with Chronicle-Queue update

2023-03-16 Thread Mick Semb Wever
>  asm-analysis-9.4.jar
>  asm-commons-9.4.jar
>  asm-tree-9.4.jar
>  asm-util-9.4.jar


FYI, on further inspection of the posix dependency, i've excluded
these four asm* dependencies.


Re: [DISCUSS] Change the useage of nodetool tablehistograms

2023-03-16 Thread Jeremiah D Jordan
-1 on any change which breaks the previously documented usage.
+1 any additions to what the tool can do without breaking previously documented 
behavior.

> On Mar 16, 2023, at 7:42 AM, Josh McKenzie  wrote:
> 
> We could also consider augmenting the tool with new named arguments with the 
> functionality you described and leave the positional usage intact.
> 
> On Thu, Mar 16, 2023, at 6:43 AM, Bowen Song via dev wrote:
>> The documented command options are:
>> 
>> nodetool tablehistograms [  | ]
>> 
>> 
>> 
>> That means one parameter will be treated as dot separated keyspace and 
>> table. Alternatively, two parameters will be treated as the keyspace and 
>> table respectively.
>> 
>> To remain compatible with the documented behaviour, my suggestion is to 
>> change the command options to:
>> 
>> nodetool tablehistograms [  [ [...]] | 
>>  [ [...]]]
>> 
>> Feel free to add the "all except ..." feature to the above.
>> 
>> This doesn't break backward compatibility in documented ways. It only 
>> changes the undocumented behaviour. If someone is using the undocumented 
>> behaviour, they must know things may break when the software is upgraded. We 
>> can just add a line to the NEWS.txt and let them update their scripts.
>> 
>> 
>> On 16/03/2023 08:53, guo Maxwell wrote:
>>> Hello everyone :
>>> The nodetool tablehistograms have one argument which you can fill with only 
>>> one table name with the format "keyspace_name.table_name /keyspace_name 
>>> table_name", so that you can get the table histograms of the specied table.
>>> 
>>> And  if none arguments is set, all the tables' histograms will be print 
>>> out.And if more than 2 arguments (nomatter the format is right or wrong) 
>>> are set , all the tables' histograms will also be print out too(Which is a 
>>> bug In my mind).
>>> 
>>> So the usage of nodetool tablehistograms has some usage restrictions, That 
>>> is either output one , or all informations.
>>> 
>>> As CASSANDRA-18296  
>>> described , I will change the usage of nodetool tablehistograms, which 
>>> support the feature below:
>>> 1. nodetool tablehistograms ks.tb1 ks.tb2  //print out list of tables' 
>>> histograms with format keyspace.table
>>> 2.nodetool tablehistograms ks1 ks2 ks3 ... //print out list of keyspaces 
>>> histograms
>>> 3.nodetool tablehistograms -i ks1 ks2  //print out list of table 
>>> histograms except for the keyspaces list behind the option -i
>>> 4.nodetool tablehistograns -i ks ks.tb // print out list tables' histograms 
>>> except for table in keyspace ks and ks.tb table.
>>> 5.none option specified ,then all tables histograms will be print out.
>>> 
>>> The usage will breaks compatibility with how it was done previously, and as 
>>> this is a user facing tool.
>>> 
>>> So, What do you think? 
>>> 
>>> Thanks~~~



Re: [DISCUSS] New dependencies with Chronicle-Queue update

2023-03-16 Thread Derek Chen-Becker
Deletion is the highest form of engineering ;)

On Thu, Mar 16, 2023 at 7:09 AM Mick Semb Wever  wrote:

> >  asm-analysis-9.4.jar
> >  asm-commons-9.4.jar
> >  asm-tree-9.4.jar
> >  asm-util-9.4.jar
>
>
> FYI, on further inspection of the posix dependency, i've excluded
> these four asm* dependencies.
>


-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Re: Role of Hadoop code in Cassandra 5.0

2023-03-16 Thread David Capwell
Isn’t our deprecation rules that if we deprecate in 4.0.0 we can remove in 5.x, 
but 4.x needs to wait for 6.x?  I am cool deprecating this and willing to pull 
into another repo if people (not me) are willing to maintain it (else just 
delete).

> On Mar 10, 2023, at 1:13 AM, Jacek Lewandowski  
> wrote:
> 
> I've experimentally added 
> https://issues.apache.org/jira/browse/CASSANDRA-16984 to 
> https://issues.apache.org/jira/browse/CASSANDRA-18306 (post 4.0 cleanup)
> 
> - - -- --- -  -
> Jacek Lewandowski
> 
> 
> pt., 10 mar 2023 o 09:56 Berenguer Blasi  > napisał(a):
>> +1 deprecate + removal
>> 
>> On 10/3/23 1:41, Jeremy Hanna wrote:
>>> It was mainly to integrate with Hadoop - I used it from 0.6 to 1.2 in 
>>> production prior to starting at DataStax and at that time I was stitching 
>>> together Cloudera's distribution of Hadoop with Cassandra.  Back then there 
>>> were others that used it as well.  As far as I know, usage dropped off when 
>>> the Spark Cassandra Connector got pretty mature.  It enabled people to take 
>>> an off the shelf Hadoop distribution and run the Hadoop processes on the 
>>> same nodes or external to the Cassandra cluster and get topology 
>>> information to do things like Hadoop splits and things like that through 
>>> the Hadoop interfaces.  I think the version lag is an indication that it 
>>> hasn't been used recently.  Also, like others have said, the Spark 
>>> Cassandra Connector is really what people should be using at this point 
>>> imo.  That or depending on the use case, Apple's bulk reader: 
>>> https://github.com/jberragan/spark-cassandra-bulkreader that is mentioned 
>>> on https://issues.apache.org/jira/browse/CASSANDRA-16222.
>>> 
 On Mar 9, 2023, at 12:00 PM, Rahul Xavier Singh 
   wrote:
 
 What is the hadoop code for? For interacting from Hadoop via CQL, or 
 Thrift if it's that old, or directly looking at SSTables? Been using C* 
 since 2 and have never used it. 
 
 Agree to deprecate in next possible 4.1.x version and remove in 5.0 
 
 Rahul Singh
 Chief Executive Officer | Business Platform Architect
 m: 202.905.2818 e: rahul.si...@anant.us  li: 
 http://linkedin.com/in/xingh ca: http://calendly.com/xingh
 
 We create, support, and manage real-time global data & analytics platforms 
 for the modern enterprise.
 
 Anant | https://anant.us 
  3 Washington Circle, Suite 301
 Washington, D.C. 20037
 
 http://Cassandra.Link  : The best resources for 
 Apache Cassandra
 
 
 On Thu, Mar 9, 2023 at 12:53 PM Brandon Williams >>> > wrote:
> I think if we reach consensus here that decides it. I too vote to
> deprecate in 4.1.x.  This means we would remove it in 5.0.
> 
> Kind Regards,
> Brandon
> 
> On Thu, Mar 9, 2023 at 11:32 AM Ekaterina Dimitrova
> mailto:e.dimitr...@gmail.com>> wrote:
> >
> > Deprecation sounds good to me, but I am not completely sure in which 
> > version we can do it. If it is possible to add a deprecation warning in 
> > the 4.x series or at least 4.1.x - I vote for that.
> >
> > On Thu, 9 Mar 2023 at 12:14, Jacek Lewandowski 
> > mailto:lewandowski.ja...@gmail.com>> 
> > wrote:
> >>
> >> Is it possible to deprecate it in the 4.1.x patch release? :)
> >>
> >>
> >> - - -- --- -  -
> >> Jacek Lewandowski
> >>
> >>
> >> czw., 9 mar 2023 o 18:11 Brandon Williams  >> > napisał(a):
> >>>
> >>> This is my feeling too, but I think we should accomplish this by
> >>> deprecating it first.  I don't expect anything will change after the
> >>> deprecation period.
> >>>
> >>> Kind Regards,
> >>> Brandon
> >>>
> >>> On Thu, Mar 9, 2023 at 11:09 AM Jacek Lewandowski
> >>> mailto:lewandowski.ja...@gmail.com>> 
> >>> wrote:
> >>> >
> >>> > I vote for removing it entirely.
> >>> >
> >>> > thanks
> >>> > - - -- --- -  -
> >>> > Jacek Lewandowski
> >>> >
> >>> >
> >>> > czw., 9 mar 2023 o 18:07 Miklosovic, Stefan 
> >>> >  >>> > > napisał(a):
> >>> >>
> >>> >> Derek,
> >>> >>
> >>> >> I have couple more points ... I do not think that extracting it to 
> >>> >> a separate repository is "win". That code is on Hadoop 1.0.3. We 
> >>> >> would be spending a lot of work on extracting it just to extract 
> >>> >> 10 years old code with occasional updates (in my humble opinion 
> >>> >> just to make it compilable again if the code around changes). What 
> >>> >> good is in that? We would have one more place to take care of ... 
> >>> >> 

Re: Role of Hadoop code in Cassandra 5.0

2023-03-16 Thread Miklosovic, Stefan
I think we already decided it in this thread.

I was specifically asking this question:

Deprecation would mean that the code has to be there whole 5.0 so we can remove 
it for real in 6.0?

To which the response was:

I think if we reach consensus here that decides it. I too vote to
deprecate in 4.1.x.  This means we would remove it in 5.0.

Then bunch of +1s followed and agreed with that explicitly.

I do not plan to maintain nor extract that, personally.


From: David Capwell 
Sent: Thursday, March 16, 2023 22:13
To: dev@cassandra.apache.org
Subject: Re: Role of Hadoop code in Cassandra 5.0

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Isn’t our deprecation rules that if we deprecate in 4.0.0 we can remove in 5.x, 
but 4.x needs to wait for 6.x?  I am cool deprecating this and willing to pull 
into another repo if people (not me) are willing to maintain it (else just 
delete).

On Mar 10, 2023, at 1:13 AM, Jacek Lewandowski  
wrote:

I've experimentally added https://issues.apache.org/jira/browse/CASSANDRA-16984 
to https://issues.apache.org/jira/browse/CASSANDRA-18306 (post 4.0 cleanup)

- - -- --- -  -
Jacek Lewandowski


pt., 10 mar 2023 o 09:56 Berenguer Blasi 
mailto:berenguerbl...@gmail.com>> napisał(a):

+1 deprecate + removal

On 10/3/23 1:41, Jeremy Hanna wrote:
It was mainly to integrate with Hadoop - I used it from 0.6 to 1.2 in 
production prior to starting at DataStax and at that time I was stitching 
together Cloudera's distribution of Hadoop with Cassandra.  Back then there 
were others that used it as well.  As far as I know, usage dropped off when the 
Spark Cassandra Connector got pretty mature.  It enabled people to take an off 
the shelf Hadoop distribution and run the Hadoop processes on the same nodes or 
external to the Cassandra cluster and get topology information to do things 
like Hadoop splits and things like that through the Hadoop interfaces.  I think 
the version lag is an indication that it hasn't been used recently.  Also, like 
others have said, the Spark Cassandra Connector is really what people should be 
using at this point imo.  That or depending on the use case, Apple's bulk 
reader: https://github.com/jberragan/spark-cassandra-bulkreader that is 
mentioned on https://issues.apache.org/jira/browse/CASSANDRA-16222.

On Mar 9, 2023, at 12:00 PM, Rahul Xavier Singh 
 wrote:

What is the hadoop code for? For interacting from Hadoop via CQL, or Thrift if 
it's that old, or directly looking at SSTables? Been using C* since 2 and have 
never used it.

Agree to deprecate in next possible 4.1.x version and remove in 5.0

Rahul Singh
Chief Executive Officer | Business Platform Architect m: 202.905.2818 e: 
rahul.si...@anant.us li: 
http://linkedin.com/in/xingh ca: http://calendly.com/xingh

We create, support, and manage real-time global data & analytics platforms for 
the modern enterprise.

Anant | https://anant.us
3 Washington Circle, Suite 301
Washington, D.C. 20037

http://Cassandra.Link : The best resources for Apache 
Cassandra


On Thu, Mar 9, 2023 at 12:53 PM Brandon Williams 
mailto:dri...@gmail.com>> wrote:
I think if we reach consensus here that decides it. I too vote to
deprecate in 4.1.x.  This means we would remove it in 5.0.

Kind Regards,
Brandon

On Thu, Mar 9, 2023 at 11:32 AM Ekaterina Dimitrova
mailto:e.dimitr...@gmail.com>> wrote:
>
> Deprecation sounds good to me, but I am not completely sure in which version 
> we can do it. If it is possible to add a deprecation warning in the 4.x 
> series or at least 4.1.x - I vote for that.
>
> On Thu, 9 Mar 2023 at 12:14, Jacek Lewandowski 
> mailto:lewandowski.ja...@gmail.com>> wrote:
>>
>> Is it possible to deprecate it in the 4.1.x patch release? :)
>>
>>
>> - - -- --- -  -
>> Jacek Lewandowski
>>
>>
>> czw., 9 mar 2023 o 18:11 Brandon Williams 
>> mailto:dri...@gmail.com>> napisał(a):
>>>
>>> This is my feeling too, but I think we should accomplish this by
>>> deprecating it first.  I don't expect anything will change after the
>>> deprecation period.
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>> On Thu, Mar 9, 2023 at 11:09 AM Jacek Lewandowski
>>> mailto:lewandowski.ja...@gmail.com>> wrote:
>>> >
>>> > I vote for removing it entirely.
>>> >
>>> > thanks
>>> > - - -- --- -  -
>>> > Jacek Lewandowski
>>> >
>>> >
>>> > czw., 9 mar 2023 o 18:07 Miklosovic, Stefan 
>>> > mailto:stefan.mikloso...@netapp.com>> 
>>> > napisał(a):
>>> >>
>>> >> Derek,
>>> >>
>>> >> I have couple more points ... I do not think that extracting it to a 
>>> >> separate repository is "win". That code is on Hadoop 1.0.3. We would be 
>>> >> spending a lot of work on extracting it just to extract 10 years old 
>>> >

Re: Role of Hadoop code in Cassandra 5.0

2023-03-16 Thread Jeremy Hanna
Regarding deprecation, while I support the deprecation and removal from the 
Cassandra codebase, I do think we should communicate that with the wider 
community (user thread?) so people aren't surprised - especially since it's 
already four months after the 4.1.0 release.  That would hopefully also 
encourage those interested in continuing support to extract it out into a 
separate library.

> On Mar 16, 2023, at 4:19 PM, Miklosovic, Stefan 
>  wrote:
> 
> I think we already decided it in this thread.
> 
> I was specifically asking this question:
> 
> Deprecation would mean that the code has to be there whole 5.0 so we can 
> remove it for real in 6.0?
> 
> To which the response was:
> 
> I think if we reach consensus here that decides it. I too vote to
> deprecate in 4.1.x.  This means we would remove it in 5.0.
> 
> Then bunch of +1s followed and agreed with that explicitly.
> 
> I do not plan to maintain nor extract that, personally.
> 
> 
> From: David Capwell mailto:dcapw...@apple.com>>
> Sent: Thursday, March 16, 2023 22:13
> To: dev@cassandra.apache.org 
> Subject: Re: Role of Hadoop code in Cassandra 5.0
> 
> NetApp Security WARNING: This is an external email. Do not click links or 
> open attachments unless you recognize the sender and know the content is safe.
> 
> 
> 
> Isn’t our deprecation rules that if we deprecate in 4.0.0 we can remove in 
> 5.x, but 4.x needs to wait for 6.x?  I am cool deprecating this and willing 
> to pull into another repo if people (not me) are willing to maintain it (else 
> just delete).
> 
> On Mar 10, 2023, at 1:13 AM, Jacek Lewandowski  
> wrote:
> 
> I've experimentally added 
> https://issues.apache.org/jira/browse/CASSANDRA-16984 to 
> https://issues.apache.org/jira/browse/CASSANDRA-18306 (post 4.0 cleanup)
> 
> - - -- --- -  -
> Jacek Lewandowski
> 
> 
> pt., 10 mar 2023 o 09:56 Berenguer Blasi  > 
> napisał(a):
> 
> +1 deprecate + removal
> 
> On 10/3/23 1:41, Jeremy Hanna wrote:
> It was mainly to integrate with Hadoop - I used it from 0.6 to 1.2 in 
> production prior to starting at DataStax and at that time I was stitching 
> together Cloudera's distribution of Hadoop with Cassandra.  Back then there 
> were others that used it as well.  As far as I know, usage dropped off when 
> the Spark Cassandra Connector got pretty mature.  It enabled people to take 
> an off the shelf Hadoop distribution and run the Hadoop processes on the same 
> nodes or external to the Cassandra cluster and get topology information to do 
> things like Hadoop splits and things like that through the Hadoop interfaces. 
>  I think the version lag is an indication that it hasn't been used recently.  
> Also, like others have said, the Spark Cassandra Connector is really what 
> people should be using at this point imo.  That or depending on the use case, 
> Apple's bulk reader: https://github.com/jberragan/spark-cassandra-bulkreader 
> that is mentioned on https://issues.apache.org/jira/browse/CASSANDRA-16222.
> 
> On Mar 9, 2023, at 12:00 PM, Rahul Xavier Singh  > 
> wrote:
> 
> What is the hadoop code for? For interacting from Hadoop via CQL, or Thrift 
> if it's that old, or directly looking at SSTables? Been using C* since 2 and 
> have never used it.
> 
> Agree to deprecate in next possible 4.1.x version and remove in 5.0
> 
> Rahul Singh
> Chief Executive Officer | Business Platform Architect m: 202.905.2818 e: 
> rahul.si...@anant.us 
>  li: 
> http://linkedin.com/in/xingh ca: http://calendly.com/xingh
> 
> We create, support, and manage real-time global data & analytics platforms 
> for the modern enterprise.
> 
> Anant | https://anant.us
> 3 Washington Circle, Suite 301
> Washington, D.C. 20037
> 
> http://Cassandra.Link 
> > : The best resources for 
> Apache Cassandra
> 
> 
> On Thu, Mar 9, 2023 at 12:53 PM Brandon Williams  > wrote:
> I think if we reach consensus here that decides it. I too vote to
> deprecate in 4.1.x.  This means we would remove it in 5.0.
> 
> Kind Regards,
> Brandon
> 
> On Thu, Mar 9, 2023 at 11:32 AM Ekaterina Dimitrova
>  > wrote:
>> 
>> Deprecation sounds good to me, but I am not completely sure in which version 
>> we can do it. If it is possible to add a deprecation warning in the 4.x 
>> series or at least 4.1.x - I vote for that.
>> 
>> On Thu, 9 Mar 2023 at 12:14, Jacek Lewandowski > > 
>> wrote:
>>> 
>>> Is it possible to deprecate it in the 4.1.x patch release? :)
>>> 
>>> 
>>> - - 

Re: Role of Hadoop code in Cassandra 5.0

2023-03-16 Thread Miklosovic, Stefan
I reached out to Hadoop Slack channel and I asked if there would be somebody to 
help us with the update. The first response was something about "why do you 
ask? we are not going to spend time on updating it for you" (fair enough), next 
responses were like "if this is so old and not maintained, it does not seem 
like people even care" which is totally spot on. Who are we really addressing 
here? This integration became practically irrelevant the day Spark connector 
was mature enough, it will be even less so once 5.0 is out.

The actual removal in 6.0 means that this will not go away sooner than ... 
2025? Do we really want to be removing 12 years old code two years from now? 
That also means that we need to make sure it at least compiles etc. I would say 
that nobody cares already.

I do not have a problem with dropping an email to user's list. I'll get to it 
tomorrow.


From: Jeremy Hanna 
Sent: Thursday, March 16, 2023 22:27
To: dev@cassandra.apache.org
Subject: Re: Role of Hadoop code in Cassandra 5.0

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Regarding deprecation, while I support the deprecation and removal from the 
Cassandra codebase, I do think we should communicate that with the wider 
community (user thread?) so people aren't surprised - especially since it's 
already four months after the 4.1.0 release.  That would hopefully also 
encourage those interested in continuing support to extract it out into a 
separate library.

On Mar 16, 2023, at 4:19 PM, Miklosovic, Stefan  
wrote:

I think we already decided it in this thread.

I was specifically asking this question:

Deprecation would mean that the code has to be there whole 5.0 so we can remove 
it for real in 6.0?

To which the response was:

I think if we reach consensus here that decides it. I too vote to
deprecate in 4.1.x.  This means we would remove it in 5.0.

Then bunch of +1s followed and agreed with that explicitly.

I do not plan to maintain nor extract that, personally.


From: David Capwell mailto:dcapw...@apple.com>>
Sent: Thursday, March 16, 2023 22:13
To: dev@cassandra.apache.org
Subject: Re: Role of Hadoop code in Cassandra 5.0

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Isn’t our deprecation rules that if we deprecate in 4.0.0 we can remove in 5.x, 
but 4.x needs to wait for 6.x?  I am cool deprecating this and willing to pull 
into another repo if people (not me) are willing to maintain it (else just 
delete).

On Mar 10, 2023, at 1:13 AM, Jacek Lewandowski  
wrote:

I've experimentally added https://issues.apache.org/jira/browse/CASSANDRA-16984 
to https://issues.apache.org/jira/browse/CASSANDRA-18306 (post 4.0 cleanup)

- - -- --- -  -
Jacek Lewandowski


pt., 10 mar 2023 o 09:56 Berenguer Blasi 
mailto:berenguerbl...@gmail.com>>
 napisał(a):

+1 deprecate + removal

On 10/3/23 1:41, Jeremy Hanna wrote:
It was mainly to integrate with Hadoop - I used it from 0.6 to 1.2 in 
production prior to starting at DataStax and at that time I was stitching 
together Cloudera's distribution of Hadoop with Cassandra.  Back then there 
were others that used it as well.  As far as I know, usage dropped off when the 
Spark Cassandra Connector got pretty mature.  It enabled people to take an off 
the shelf Hadoop distribution and run the Hadoop processes on the same nodes or 
external to the Cassandra cluster and get topology information to do things 
like Hadoop splits and things like that through the Hadoop interfaces.  I think 
the version lag is an indication that it hasn't been used recently.  Also, like 
others have said, the Spark Cassandra Connector is really what people should be 
using at this point imo.  That or depending on the use case, Apple's bulk 
reader: https://github.com/jberragan/spark-cassandra-bulkreader that is 
mentioned on https://issues.apache.org/jira/browse/CASSANDRA-16222.

On Mar 9, 2023, at 12:00 PM, Rahul Xavier Singh 
mailto:rahul.xavier.si...@gmail.com>>
 wrote:

What is the hadoop code for? For interacting from Hadoop via CQL, or Thrift if 
it's that old, or directly looking at SSTables? Been using C* since 2 and have 
never used it.

Agree to deprecate in next possible 4.1.x version and remove in 5.0

Rahul Singh
Chief Executive Officer | Business Platform Architect m: 202.905.2818 e: 
rahul.si...@anant.us 
li: http://linkedin.com/in/xingh ca: http://calendly.com/xingh

We create, support, and manage real-time global data & analytics platforms for 
the modern enterprise.

Anant | https://anant.us
3