Re: Role of Hadoop code in Cassandra 5.0

Berenguer Blasi Fri, 10 Mar 2023 00:56:42 -0800

+1 deprecate + removal

On 10/3/23 1:41, Jeremy Hanna wrote:

It was mainly to integrate with Hadoop - I used it from 0.6 to 1.2 inproduction prior to starting at DataStax and at that time I wasstitching together Cloudera's distribution of Hadoop with Cassandra. Back then there were others that used it as well. As far as I know,usage dropped off when the Spark Cassandra Connector got prettymature. It enabled people to take an off the shelf Hadoopdistribution and run the Hadoop processes on the same nodes orexternal to the Cassandra cluster and get topology information to dothings like Hadoop splits and things like that through the Hadoopinterfaces. I think the version lag is an indication that it hasn'tbeen used recently. Also, like others have said, the Spark CassandraConnector is really what people should be using at this point imo. That or depending on the use case, Apple's bulk reader:https://github.com/jberragan/spark-cassandra-bulkreader that ismentioned on https://issues.apache.org/jira/browse/CASSANDRA-16222.

On Mar 9, 2023, at 12:00 PM, Rahul Xavier Singh<[email protected]> wrote:

What is the hadoop code for? For interacting from Hadoop via CQL, orThrift if it's that old, or directly looking at SSTables? Been usingC* since 2 and have never used it.


Agree to deprecate in next possible 4.1.x version and remove in 5.0

Rahul Singh

Chief Executive Officer | Business Platform Architect m: 202.905.2818e: [email protected] li: http://linkedin.com/in/xingh ca:http://calendly.com/xingh

**We create, support, and manage real-time global data & analyticsplatforms for the modern enterprise.*

*
Anant | https://anant.us*
3 Washington Circle, Suite 301
Washington, D.C. 20037

*http://Cassandra.Link <http://cassandra.link/>* : The best resourcesfor Apache Cassandra

On Thu, Mar 9, 2023 at 12:53 PM Brandon Williams <[email protected]>wrote:


    I think if we reach consensus here that decides it. I too vote to
    deprecate in 4.1.x.  This means we would remove it in 5.0.

    Kind Regards,
    Brandon

    On Thu, Mar 9, 2023 at 11:32 AM Ekaterina Dimitrova
    <[email protected]> wrote:
    >
    > Deprecation sounds good to me, but I am not completely sure in
    which version we can do it. If it is possible to add a
    deprecation warning in the 4.x series or at least 4.1.x - I vote
    for that.
    >
    > On Thu, 9 Mar 2023 at 12:14, Jacek Lewandowski
    <[email protected]> wrote:
    >>
    >> Is it possible to deprecate it in the 4.1.x patch release? :)
    >>
    >>
    >> - - -- --- ----- -------- -------------
    >> Jacek Lewandowski
    >>
    >>
    >> czw., 9 mar 2023 o 18:11 Brandon Williams <[email protected]>
    napisał(a):
    >>>
    >>> This is my feeling too, but I think we should accomplish this by
    >>> deprecating it first.  I don't expect anything will change
    after the
    >>> deprecation period.
    >>>
    >>> Kind Regards,
    >>> Brandon
    >>>
    >>> On Thu, Mar 9, 2023 at 11:09 AM Jacek Lewandowski
    >>> <[email protected]> wrote:
    >>> >
    >>> > I vote for removing it entirely.
    >>> >
    >>> > thanks
    >>> > - - -- --- ----- -------- -------------
    >>> > Jacek Lewandowski
    >>> >
    >>> >
    >>> > czw., 9 mar 2023 o 18:07 Miklosovic, Stefan
    <[email protected]> napisał(a):
    >>> >>
    >>> >> Derek,
    >>> >>
    >>> >> I have couple more points ... I do not think that
    extracting it to a separate repository is "win". That code is on
    Hadoop 1.0.3. We would be spending a lot of work on extracting it
    just to extract 10 years old code with occasional updates (in my
    humble opinion just to make it compilable again if the code
    around changes). What good is in that? We would have one more
    place to take care of ... Now we at least have it all in one place.
    >>> >>
    >>> >> I believe we have four options:
    >>> >>
    >>> >> 1) leave it there so it will be like this is for next
    years with questionable and diminishing usage
    >>> >> 2) update it to Hadoop 3.3 (I wonder who is going to do that)
    >>> >> 3) 2) and extract it to a separate repository but if we do
    2) we can just leave it there
    >>> >> 4) remove it
    >>> >>
    >>> >> ________________________________________
    >>> >> From: Derek Chen-Becker <[email protected]>
    >>> >> Sent: Thursday, March 9, 2023 15:55
    >>> >> To: [email protected]
    >>> >> Subject: Re: Role of Hadoop code in Cassandra 5.0
    >>> >>
    >>> >> NetApp Security WARNING: This is an external email. Do not
    click links or open attachments unless you recognize the sender
    and know the content is safe.
    >>> >>
    >>> >>
    >>> >>
    >>> >> I think the question isn't "Who ... is still using that?"
    but more "are we actually going to support it?" If we're on a
    version that old it would appear that we've basically abandoned
    it, although there do appear to have been refactoring (for other
    things) commits in the last couple of years. I would be in favor
    of removal from 5.0, but at the very least, could it be moved
    into a separate repo/package so that it's not pulling a
    relatively large dependency subtree from Hadoop into our main
    codebase?
    >>> >>
    >>> >> Cheers,
    >>> >>
    >>> >> Derek
    >>> >>
    >>> >> On Thu, Mar 9, 2023 at 6:44 AM Miklosovic, Stefan
    <[email protected]<mailto:[email protected]>>
    wrote:
    >>> >> Hi list,
    >>> >>
    >>> >> I stumbled upon Hadoop package again. I think there was
    some discussion about the relevancy of Hadoop code some time ago
    but I would like to ask this again.
    >>> >>
    >>> >> Do you think Hadoop code (1) is still relevant in 5.0? Who
    in the industry is still using that?
    >>> >>
    >>> >> We might drop a lot of code and some Hadoop dependencies
    too (3) (even their scope is "provided"). The version of Hadoop
    we build upon is 1.0.3 which was released 10 years ago. This code
    does not have any tests nor documentation on the website.
    >>> >>
    >>> >> There seems to be issues like this (2) and it seems like
    the solution is to, basically, use Spark Cassandra connector
    instead which I would say is quite reasonable.
    >>> >>
    >>> >> Regards
    >>> >>
    >>> >> (1)
    
https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/hadoop
    >>> >> (2)
    https://lists.apache.org/thread/jdy5hdc2l7l29h04dqol5ylroqos1y2p
    >>> >> (3)
    
https://github.com/apache/cassandra/blob/trunk/.build/parent-pom-template.xml#L507-L589
    >>> >>
    >>> >>
    >>> >> --
    >>> >>
    +---------------------------------------------------------------+
    >>> >> | Derek Chen-Becker                              |
    >>> >> | GPG Key available at https://keybase.io/dchenbecker and 
         |
    >>> >> |
    https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
    >>> >> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4
    6ACC  |
    >>> >>
    +---------------------------------------------------------------+
    >>> >>

Re: Role of Hadoop code in Cassandra 5.0

Reply via email to