Any code specific questions can be asked here or in #cassandra-dev on freenode.
Discussion regarding usefulness etc is probably best to keep in a JIRA ticket. /Marcus On Mon, Jul 11, 2016 at 7:06 PM, Pedro Gordo <pedro.gordo1...@gmail.com> wrote: > Hi all > > I'm finishing an MSc in which my final project is to implement a new > compaction strategy in Cassandra. I've discussed the main points of the > strategy with other community members and received valuable feedback. > However, I understand this will be a tough challenge for someone who has > never worked with Cassandra, but after getting to know the technology, I've > found it fascinating. This mixed with always wanting to contribute to an > ope source project led me to chose it as the topic for my MSC Project. > > But because this is my first time contributing to an open source project, > I've some questions on how to proceed correctly. Looking at the Contribute > <http://wiki.apache.org/cassandra/HowToContribute> page, I see that we're > supposed to create a ticket before starting working on it, so should I just > create one or does the strategy usefulness need to be validated by someone > before? In this case, should I just proceed and implement it, or do > something else? And finally, is this the correct mailing list to be asking > this sort of questions? :) > > As for the code itself, in case I have a question like "Should we be using > an abstract class for compaction classes?" or "What is this method supposed > to do?", can I ask here? > What is the best course of action to learn about the details of the code in > Cassandra? I already saw that it has some comments, but probably won't be > enough for me. > > The strategy I have in mind will be very simple until I finish the MSc. > After that, I'll improve it with other features and feedback I got, but for > the moment, it'll rely on a time interval (probably scheduled at specific > hours, maybe during a time with less traffic on the system). During that > time interval, the rows will be made unique across all SSTables, but only > if, after a prior analysis, we find that the row exists in a certain number > of SSTables above a certain threshold. > > I suppose it's a naive strategy, but the aim here is to give me experience > with C*, and of course I'll be happy to take suggestions. But I'll probably > only use the ideas after delivering the project because, at the moment, I > need to keep it simple. Otherwise, I'll never be able to deliver the > project. :) > > Sorry for the long email, and thanks for all the help in advance! I'm very > excited about this project and look forward to being part of this > community! > > Best regards > Pedro Gordo >