Re: Compaction strategy contribution

2016-07-12 Thread Marcus Eriksson
Any code specific questions can be asked here or in #cassandra-dev on
freenode.

Discussion regarding usefulness etc is probably best to keep in a JIRA
ticket.

/Marcus

On Mon, Jul 11, 2016 at 7:06 PM, Pedro Gordo 
wrote:

> Hi all
>
> I'm finishing an MSc in which my final project is to implement a new
> compaction strategy in Cassandra. I've discussed the main points of the
> strategy with other community members and received valuable feedback.
> However, I understand this will be a tough challenge for someone who has
> never worked with Cassandra, but after getting to know the technology, I've
> found it fascinating. This mixed with always wanting to contribute to an
> ope source project led me to chose it as the topic for my MSC Project.
>
> But because this is my first time contributing to an open source project,
> I've some questions on how to proceed correctly. Looking at the Contribute
>  page, I see that we're
> supposed to create a ticket before starting working on it, so should I just
> create one or does the strategy usefulness need to be validated by someone
> before? In this case, should I just proceed and implement it, or do
> something else? And finally, is this the correct mailing list to be asking
> this sort of questions? :)
>
> As for the code itself, in case I have a question like "Should we be using
> an abstract class for compaction classes?" or "What is this method supposed
> to do?", can I ask here?
> What is the best course of action to learn about the details of the code in
> Cassandra? I already saw that it has some comments, but probably won't be
> enough for me.
>
> The strategy I have in mind will be very simple until I finish the MSc.
> After that, I'll improve it with other features and feedback I got, but for
> the moment, it'll rely on a time interval (probably scheduled at specific
> hours, maybe during a time with less traffic on the system). During that
> time interval, the rows will be made unique across all SSTables, but only
> if, after a prior analysis, we find that the row exists in a certain number
> of SSTables above a certain threshold.
>
> I suppose it's a naive strategy, but the aim here is to give me experience
> with C*, and of course I'll be happy to take suggestions. But I'll probably
> only use the ideas after delivering the project because, at the moment, I
> need to keep it simple. Otherwise, I'll never be able to deliver the
> project. :)
>
> Sorry for the long email, and thanks for all the help in advance! I'm very
> excited about this project and look forward to being part of this
> community!
>
> Best regards
> Pedro Gordo
>


MSc Project - compaction strategy

2016-07-12 Thread Pedro Gordo
Hi all

I'm finishing an MSc in which my final project is to implement a new
compaction strategy in Cassandra. I've discussed the main points of the
strategy with other community members and received valuable feedback.
However, I understand this will be a tough challenge for someone who has
never worked with Cassandra, but after getting to know the technology, I've
found it fascinating. Since I wanted to contribute to an open source
project in my MSc Project, this makes Cassandra the ideal technology to go
forward, and hence why I've chosen it.

However, since this is my first time contributing to an open source
project, I've some questions on how to proceed correctly. Looking at the How
To Contribute  page, I
see that we're supposed to create a ticket before starting working on it,
however, in this case, does someone need to validate the usefulness of the
strategy or can I just proceed and implement it, or do something else?
Also, is this the correct mailing list to be asking this sort of questions?
:)

As for the code itself, if I have a question like "Should we be using an
abstract class for compaction classes?" or "What is this method supposed to
do?", can I ask it here? What is the best course of action to learn about
the details of the code in Cassandra? I already saw that it has some
comments, but probably won't be enough.

The strategy I have in mind will be very simple until I finish the MSc.
After the submission, I'll improve it with other features and feedback I
got, but for the moment, I'll keep it at a basic level. The strategy will
start only during certain periods of time (for example a time of the day
where the cluster has little traffic (1)), during which, the rows will be
made unique across all SSTables. These new tables will be capped at a
configurable size, so after compaction, we can have multiple tables
created. This operation only happens if, after a prior analysis, we find
that the row exists in a number of SSTables above a certain threshold. What
I'm trying to address here is the continuous high CPU usage of the LCS (1),
but also the need for lots of disc space when we have big SSTables
resulting from STCS. I suppose it's a naive strategy, but the aim here is
to give me experience with C*, and of course I'll be happy to take
suggestions. But I'll probably only use the ideas after delivering the
project because, at the moment, I need to keep it simple. Otherwise, I'll
never be able to submit it. :)

Sorry for the long email, and thanks for all the help in advance! I'm very
excited about this project and look forward to being part of this community!

Best regards Pedro Gordo


Re: MSc Project - compaction strategy

2016-07-12 Thread Robert Stupp
As Markus already mentioned, the best place to discuss the idea of your 
compaction strategy is a lira ticket.
Best would be to include as much details (written, not coded) as necessary to 
understand why this compaction strategy is useful and how it works.

Implementation questions and clarifications on #cassandra-dev IRC

Robert

—
Robert Stupp
@snazy

> On 12 Jul 2016, at 19:42, Pedro Gordo  wrote:
> 
> Hi all
> 
> I'm finishing an MSc in which my final project is to implement a new
> compaction strategy in Cassandra. I've discussed the main points of the
> strategy with other community members and received valuable feedback.
> However, I understand this will be a tough challenge for someone who has
> never worked with Cassandra, but after getting to know the technology, I've
> found it fascinating. Since I wanted to contribute to an open source
> project in my MSc Project, this makes Cassandra the ideal technology to go
> forward, and hence why I've chosen it.
> 
> However, since this is my first time contributing to an open source
> project, I've some questions on how to proceed correctly. Looking at the How
> To Contribute  page, I
> see that we're supposed to create a ticket before starting working on it,
> however, in this case, does someone need to validate the usefulness of the
> strategy or can I just proceed and implement it, or do something else?
> Also, is this the correct mailing list to be asking this sort of questions?
> :)
> 
> As for the code itself, if I have a question like "Should we be using an
> abstract class for compaction classes?" or "What is this method supposed to
> do?", can I ask it here? What is the best course of action to learn about
> the details of the code in Cassandra? I already saw that it has some
> comments, but probably won't be enough.
> 
> The strategy I have in mind will be very simple until I finish the MSc.
> After the submission, I'll improve it with other features and feedback I
> got, but for the moment, I'll keep it at a basic level. The strategy will
> start only during certain periods of time (for example a time of the day
> where the cluster has little traffic (1)), during which, the rows will be
> made unique across all SSTables. These new tables will be capped at a
> configurable size, so after compaction, we can have multiple tables
> created. This operation only happens if, after a prior analysis, we find
> that the row exists in a number of SSTables above a certain threshold. What
> I'm trying to address here is the continuous high CPU usage of the LCS (1),
> but also the need for lots of disc space when we have big SSTables
> resulting from STCS. I suppose it's a naive strategy, but the aim here is
> to give me experience with C*, and of course I'll be happy to take
> suggestions. But I'll probably only use the ideas after delivering the
> project because, at the moment, I need to keep it simple. Otherwise, I'll
> never be able to submit it. :)
> 
> Sorry for the long email, and thanks for all the help in advance! I'm very
> excited about this project and look forward to being part of this community!
> 
> Best regards Pedro Gordo



Re: MSc Project - compaction strategy

2016-07-12 Thread Pedro Gordo
Hi

Yes, I just saw Marcus reply now, sorry for the duplicate email. The email
filters were not set up correctly. Thanks to both!

Best regards

Pedro Gordo

On 12 July 2016 at 12:39, Robert Stupp  wrote:

> As Markus already mentioned, the best place to discuss the idea of your
> compaction strategy is a lira ticket.
> Best would be to include as much details (written, not coded) as necessary
> to understand why this compaction strategy is useful and how it works.
>
> Implementation questions and clarifications on #cassandra-dev IRC
>
> Robert
>
> —
> Robert Stupp
> @snazy
>
> > On 12 Jul 2016, at 19:42, Pedro Gordo  wrote:
> >
> > Hi all
> >
> > I'm finishing an MSc in which my final project is to implement a new
> > compaction strategy in Cassandra. I've discussed the main points of the
> > strategy with other community members and received valuable feedback.
> > However, I understand this will be a tough challenge for someone who has
> > never worked with Cassandra, but after getting to know the technology,
> I've
> > found it fascinating. Since I wanted to contribute to an open source
> > project in my MSc Project, this makes Cassandra the ideal technology to
> go
> > forward, and hence why I've chosen it.
> >
> > However, since this is my first time contributing to an open source
> > project, I've some questions on how to proceed correctly. Looking at the
> How
> > To Contribute  page, I
> > see that we're supposed to create a ticket before starting working on it,
> > however, in this case, does someone need to validate the usefulness of
> the
> > strategy or can I just proceed and implement it, or do something else?
> > Also, is this the correct mailing list to be asking this sort of
> questions?
> > :)
> >
> > As for the code itself, if I have a question like "Should we be using an
> > abstract class for compaction classes?" or "What is this method supposed
> to
> > do?", can I ask it here? What is the best course of action to learn about
> > the details of the code in Cassandra? I already saw that it has some
> > comments, but probably won't be enough.
> >
> > The strategy I have in mind will be very simple until I finish the MSc.
> > After the submission, I'll improve it with other features and feedback I
> > got, but for the moment, I'll keep it at a basic level. The strategy will
> > start only during certain periods of time (for example a time of the day
> > where the cluster has little traffic (1)), during which, the rows will be
> > made unique across all SSTables. These new tables will be capped at a
> > configurable size, so after compaction, we can have multiple tables
> > created. This operation only happens if, after a prior analysis, we find
> > that the row exists in a number of SSTables above a certain threshold.
> What
> > I'm trying to address here is the continuous high CPU usage of the LCS
> (1),
> > but also the need for lots of disc space when we have big SSTables
> > resulting from STCS. I suppose it's a naive strategy, but the aim here is
> > to give me experience with C*, and of course I'll be happy to take
> > suggestions. But I'll probably only use the ideas after delivering the
> > project because, at the moment, I need to keep it simple. Otherwise, I'll
> > never be able to submit it. :)
> >
> > Sorry for the long email, and thanks for all the help in advance! I'm
> very
> > excited about this project and look forward to being part of this
> community!
> >
> > Best regards Pedro Gordo
>
>