Any code specific questions can be asked here or in #cassandra-dev on
freenode.

Discussion regarding usefulness etc is probably best to keep in a JIRA
ticket.

/Marcus

On Mon, Jul 11, 2016 at 7:06 PM, Pedro Gordo <pedro.gordo1...@gmail.com>
wrote:

> Hi all
>
> I'm finishing an MSc in which my final project is to implement a new
> compaction strategy in Cassandra. I've discussed the main points of the
> strategy with other community members and received valuable feedback.
> However, I understand this will be a tough challenge for someone who has
> never worked with Cassandra, but after getting to know the technology, I've
> found it fascinating. This mixed with always wanting to contribute to an
> ope source project led me to chose it as the topic for my MSC Project.
>
> But because this is my first time contributing to an open source project,
> I've some questions on how to proceed correctly. Looking at the Contribute
> <http://wiki.apache.org/cassandra/HowToContribute> page, I see that we're
> supposed to create a ticket before starting working on it, so should I just
> create one or does the strategy usefulness need to be validated by someone
> before? In this case, should I just proceed and implement it, or do
> something else? And finally, is this the correct mailing list to be asking
> this sort of questions? :)
>
> As for the code itself, in case I have a question like "Should we be using
> an abstract class for compaction classes?" or "What is this method supposed
> to do?", can I ask here?
> What is the best course of action to learn about the details of the code in
> Cassandra? I already saw that it has some comments, but probably won't be
> enough for me.
>
> The strategy I have in mind will be very simple until I finish the MSc.
> After that, I'll improve it with other features and feedback I got, but for
> the moment, it'll rely on a time interval (probably scheduled at specific
> hours, maybe during a time with less traffic on the system). During that
> time interval, the rows will be made unique across all SSTables, but only
> if, after a prior analysis, we find that the row exists in a certain number
> of SSTables above a certain threshold.
>
> I suppose it's a naive strategy, but the aim here is to give me experience
> with C*, and of course I'll be happy to take suggestions. But I'll probably
> only use the ideas after delivering the project because, at the moment, I
> need to keep it simple. Otherwise, I'll never be able to deliver the
> project. :)
>
> Sorry for the long email, and thanks for all the help in advance! I'm very
> excited about this project and look forward to being part of this
> community!
>
> Best regards
> Pedro Gordo
>

Reply via email to