Re: Role of Hadoop code in Cassandra 5.0

2023-03-10 Thread Berenguer Blasi

+1 deprecate + removal

On 10/3/23 1:41, Jeremy Hanna wrote:
It was mainly to integrate with Hadoop - I used it from 0.6 to 1.2 in 
production prior to starting at DataStax and at that time I was 
stitching together Cloudera's distribution of Hadoop with Cassandra. 
 Back then there were others that used it as well.  As far as I know, 
usage dropped off when the Spark Cassandra Connector got pretty 
mature.  It enabled people to take an off the shelf Hadoop 
distribution and run the Hadoop processes on the same nodes or 
external to the Cassandra cluster and get topology information to do 
things like Hadoop splits and things like that through the Hadoop 
interfaces.  I think the version lag is an indication that it hasn't 
been used recently.  Also, like others have said, the Spark Cassandra 
Connector is really what people should be using at this point imo. 
 That or depending on the use case, Apple's bulk reader: 
https://github.com/jberragan/spark-cassandra-bulkreader that is 
mentioned on https://issues.apache.org/jira/browse/CASSANDRA-16222.


On Mar 9, 2023, at 12:00 PM, Rahul Xavier Singh 
 wrote:


What is the hadoop code for? For interacting from Hadoop via CQL, or 
Thrift if it's that old, or directly looking at SSTables? Been using 
C* since 2 and have never used it.


Agree to deprecate in next possible 4.1.x version and remove in 5.0

Rahul Singh
Chief Executive Officer | Business Platform Architect m: 202.905.2818 
e: rahul.si...@anant.us li: http://linkedin.com/in/xingh ca: 
http://calendly.com/xingh

*
**We create, support, and manage real-time global data & analytics 
platforms for the modern enterprise.*

*
Anant | https://anant.us*
3 Washington Circle, Suite 301
Washington, D.C. 20037

*http://Cassandra.Link * : The best resources 
for Apache Cassandra



On Thu, Mar 9, 2023 at 12:53 PM Brandon Williams  
wrote:


I think if we reach consensus here that decides it. I too vote to
deprecate in 4.1.x.  This means we would remove it in 5.0.

Kind Regards,
Brandon

On Thu, Mar 9, 2023 at 11:32 AM Ekaterina Dimitrova
 wrote:
>
> Deprecation sounds good to me, but I am not completely sure in
which version we can do it. If it is possible to add a
deprecation warning in the 4.x series or at least 4.1.x - I vote
for that.
>
> On Thu, 9 Mar 2023 at 12:14, Jacek Lewandowski
 wrote:
>>
>> Is it possible to deprecate it in the 4.1.x patch release? :)
>>
>>
>> - - -- --- -  -
>> Jacek Lewandowski
>>
>>
>> czw., 9 mar 2023 o 18:11 Brandon Williams 
napisał(a):
>>>
>>> This is my feeling too, but I think we should accomplish this by
>>> deprecating it first.  I don't expect anything will change
after the
>>> deprecation period.
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>> On Thu, Mar 9, 2023 at 11:09 AM Jacek Lewandowski
>>>  wrote:
>>> >
>>> > I vote for removing it entirely.
>>> >
>>> > thanks
>>> > - - -- --- -  -
>>> > Jacek Lewandowski
>>> >
>>> >
>>> > czw., 9 mar 2023 o 18:07 Miklosovic, Stefan
 napisał(a):
>>> >>
>>> >> Derek,
>>> >>
>>> >> I have couple more points ... I do not think that
extracting it to a separate repository is "win". That code is on
Hadoop 1.0.3. We would be spending a lot of work on extracting it
just to extract 10 years old code with occasional updates (in my
humble opinion just to make it compilable again if the code
around changes). What good is in that? We would have one more
place to take care of ... Now we at least have it all in one place.
>>> >>
>>> >> I believe we have four options:
>>> >>
>>> >> 1) leave it there so it will be like this is for next
years with questionable and diminishing usage
>>> >> 2) update it to Hadoop 3.3 (I wonder who is going to do that)
>>> >> 3) 2) and extract it to a separate repository but if we do
2) we can just leave it there
>>> >> 4) remove it
>>> >>
>>> >> 
>>> >> From: Derek Chen-Becker 
>>> >> Sent: Thursday, March 9, 2023 15:55
>>> >> To: dev@cassandra.apache.org
>>> >> Subject: Re: Role of Hadoop code in Cassandra 5.0
>>> >>
>>> >> NetApp Security WARNING: This is an external email. Do not
click links or open attachments unless you recognize the sender
and know the content is safe.
>>> >>
>>> >>
>>> >>
>>> >> I think the question isn't "Who ... is still using that?"
but more "are we actually going to support it?" If we're on a
version that old it would appear that we've basically abandoned
it, although there do appear to have been refactoring (for other
things) commits in the last couple of years. I would be in favor
of removal from 5.0, but at the very least, could it be moved
into a separ

Re: Role of Hadoop code in Cassandra 5.0

2023-03-10 Thread Jacek Lewandowski
I've experimentally added
https://issues.apache.org/jira/browse/CASSANDRA-16984 to
https://issues.apache.org/jira/browse/CASSANDRA-18306 (post 4.0 cleanup)

- - -- --- -  -
Jacek Lewandowski


pt., 10 mar 2023 o 09:56 Berenguer Blasi 
napisał(a):

> +1 deprecate + removal
> On 10/3/23 1:41, Jeremy Hanna wrote:
>
> It was mainly to integrate with Hadoop - I used it from 0.6 to 1.2 in
> production prior to starting at DataStax and at that time I was stitching
> together Cloudera's distribution of Hadoop with Cassandra.  Back then there
> were others that used it as well.  As far as I know, usage dropped off when
> the Spark Cassandra Connector got pretty mature.  It enabled people to take
> an off the shelf Hadoop distribution and run the Hadoop processes on the
> same nodes or external to the Cassandra cluster and get topology
> information to do things like Hadoop splits and things like that through
> the Hadoop interfaces.  I think the version lag is an indication that it
> hasn't been used recently.  Also, like others have said, the Spark
> Cassandra Connector is really what people should be using at this point
> imo.  That or depending on the use case, Apple's bulk reader:
> https://github.com/jberragan/spark-cassandra-bulkreader that is mentioned
> on https://issues.apache.org/jira/browse/CASSANDRA-16222.
>
> On Mar 9, 2023, at 12:00 PM, Rahul Xavier Singh
>   wrote:
>
> What is the hadoop code for? For interacting from Hadoop via CQL, or
> Thrift if it's that old, or directly looking at SSTables? Been using C*
> since 2 and have never used it.
>
> Agree to deprecate in next possible 4.1.x version and remove in 5.0
>
> Rahul Singh
> Chief Executive Officer | Business Platform Architect m: 202.905.2818 e:
> rahul.si...@anant.us li: http://linkedin.com/in/xingh ca:
> http://calendly.com/xingh
>
> *We create, support, and manage real-time global data & analytics
> platforms for the modern enterprise.*
>
> * Anant | https://anant.us *
> 3 Washington Circle, Suite 301
> Washington, D.C. 20037
>
> *http://Cassandra.Link * : The best resources for
> Apache Cassandra
>
>
> On Thu, Mar 9, 2023 at 12:53 PM Brandon Williams  wrote:
>
>> I think if we reach consensus here that decides it. I too vote to
>> deprecate in 4.1.x.  This means we would remove it in 5.0.
>>
>> Kind Regards,
>> Brandon
>>
>> On Thu, Mar 9, 2023 at 11:32 AM Ekaterina Dimitrova
>>  wrote:
>> >
>> > Deprecation sounds good to me, but I am not completely sure in which
>> version we can do it. If it is possible to add a deprecation warning in the
>> 4.x series or at least 4.1.x - I vote for that.
>> >
>> > On Thu, 9 Mar 2023 at 12:14, Jacek Lewandowski <
>> lewandowski.ja...@gmail.com> wrote:
>> >>
>> >> Is it possible to deprecate it in the 4.1.x patch release? :)
>> >>
>> >>
>> >> - - -- --- -  -
>> >> Jacek Lewandowski
>> >>
>> >>
>> >> czw., 9 mar 2023 o 18:11 Brandon Williams 
>> napisał(a):
>> >>>
>> >>> This is my feeling too, but I think we should accomplish this by
>> >>> deprecating it first.  I don't expect anything will change after the
>> >>> deprecation period.
>> >>>
>> >>> Kind Regards,
>> >>> Brandon
>> >>>
>> >>> On Thu, Mar 9, 2023 at 11:09 AM Jacek Lewandowski
>> >>>  wrote:
>> >>> >
>> >>> > I vote for removing it entirely.
>> >>> >
>> >>> > thanks
>> >>> > - - -- --- -  -
>> >>> > Jacek Lewandowski
>> >>> >
>> >>> >
>> >>> > czw., 9 mar 2023 o 18:07 Miklosovic, Stefan <
>> stefan.mikloso...@netapp.com> napisał(a):
>> >>> >>
>> >>> >> Derek,
>> >>> >>
>> >>> >> I have couple more points ... I do not think that extracting it to
>> a separate repository is "win". That code is on Hadoop 1.0.3. We would be
>> spending a lot of work on extracting it just to extract 10 years old code
>> with occasional updates (in my humble opinion just to make it compilable
>> again if the code around changes). What good is in that? We would have one
>> more place to take care of ... Now we at least have it all in one place.
>> >>> >>
>> >>> >> I believe we have four options:
>> >>> >>
>> >>> >> 1) leave it there so it will be like this is for next years with
>> questionable and diminishing usage
>> >>> >> 2) update it to Hadoop 3.3 (I wonder who is going to do that)
>> >>> >> 3) 2) and extract it to a separate repository but if we do 2) we
>> can just leave it there
>> >>> >> 4) remove it
>> >>> >>
>> >>> >> 
>> >>> >> From: Derek Chen-Becker 
>> >>> >> Sent: Thursday, March 9, 2023 15:55
>> >>> >> To: dev@cassandra.apache.org
>> >>> >> Subject: Re: Role of Hadoop code in Cassandra 5.0
>> >>> >>
>> >>> >> NetApp Security WARNING: This is an external email. Do not click
>> links or open attachments unless you recognize the sender and know the
>> content is safe.
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> I think the question isn't "Who ... is still using that?" but more
>> "are we actually going