Re: Role of Hadoop code in Cassandra 5.0

Miklosovic, Stefan Thu, 09 Mar 2023 09:07:14 -0800

Derek,

I have couple more points ... I do not think that extracting it to a separate 
repository is "win". That code is on Hadoop 1.0.3. We would be spending a lot 
of work on extracting it just to extract 10 years old code with occasional 
updates (in my humble opinion just to make it compilable again if the code 
around changes). What good is in that? We would have one more place to take 
care of ... Now we at least have it all in one place.


I believe we have four options:

1) leave it there so it will be like this is for next years with questionable 
and diminishing usage
2) update it to Hadoop 3.3 (I wonder who is going to do that)
3) 2) and extract it to a separate repository but if we do 2) we can just leave 
it there
4) remove it

________________________________________
From: Derek Chen-Becker <[email protected]>
Sent: Thursday, March 9, 2023 15:55
To: [email protected]
Subject: Re: Role of Hadoop code in Cassandra 5.0

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



I think the question isn't "Who ... is still using that?" but more "are we 
actually going to support it?" If we're on a version that old it would appear 
that we've basically abandoned it, although there do appear to have been 
refactoring (for other things) commits in the last couple of years. I would be 
in favor of removal from 5.0, but at the very least, could it be moved into a 
separate repo/package so that it's not pulling a relatively large dependency 
subtree from Hadoop into our main codebase?

Cheers,

Derek

On Thu, Mar 9, 2023 at 6:44 AM Miklosovic, Stefan 
<[email protected]<mailto:[email protected]>> wrote:
Hi list,

I stumbled upon Hadoop package again. I think there was some discussion about 
the relevancy of Hadoop code some time ago but I would like to ask this again.

Do you think Hadoop code (1) is still relevant in 5.0? Who in the industry is 
still using that?

We might drop a lot of code and some Hadoop dependencies too (3) (even their 
scope is "provided"). The version of Hadoop we build upon is 1.0.3 which was 
released 10 years ago. This code does not have any tests nor documentation on 
the website.

There seems to be issues like this (2) and it seems like the solution is to, 
basically, use Spark Cassandra connector instead which I would say is quite 
reasonable.

Regards

(1) 
https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/hadoop
(2) https://lists.apache.org/thread/jdy5hdc2l7l29h04dqol5ylroqos1y2p
(3) 
https://github.com/apache/cassandra/blob/trunk/.build/parent-pom-template.xml#L507-L589


--
+---------------------------------------------------------------+
| Derek Chen-Becker                                             |
| GPG Key available at https://keybase.io/dchenbecker and       |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Re: Role of Hadoop code in Cassandra 5.0

Reply via email to