Re: Somewhat Weekly Cassandra Dev Wrapup

2017-11-01 Thread Malcolm Taylor
Glad to hear you are finding lgtm.com useful. I work for Semmle, the
company behind lgtm.com.

I see you are interested in checking regularly for new and fixed  alerts on
lgtm.com. This can be achieved through our Github integration described in
https://lgtm.com/docs/lgtm/using-lgtm-analysis-continuous-integration , and
is a great way to get more value from the analysis.

Regarding the hashCode violations, I think the relevant query is
https://lgtm.com/projects/g/apache/cassandra/alerts/?mode=tree&severity=error&rule=6770060
which identifies a number of classes that implement equals() without
overriding hashCode(). That would be a good place to find some further
straightforward fixes.

Thanks for the feedback regarding the Range class. I shall pass that on to
our Java team to see what they think. lgtm uses a deep analysis based on a
powerful query language (QL) which runs against a database representing all
of the source code. We are generally able to keep the number of false
positives low, but there are inevitably some that creep through, so we
appreciate the feedback. One of the strengths of our approach is that it is
often quite easy to tweak a query to make it more precise, and thus
eliminate some false positives. It is also possible to suppress individual
alerts if desired.

QL has also proved highly effective at identifying important security flaws
in various systems, including some of the apache projects. There are lots
of examples of the use of QL in our blog section at https://lgtm.com/blog

- Malcolm


On 1 November 2017 at 01:09, Jeff Beck  wrote:

> On the hashCode violations they are all on
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/dht/Range.java
> which
> does seem to get the correct hashcode impl from
> https://github.com/apache/cassandra/blob/trunk/src/java/
> org/apache/cassandra/dht/AbstractBounds.java
>
> Jeff
>
>
>


Re: Somewhat Weekly Cassandra Dev Wrapup

2017-11-01 Thread Stefan Podkowinski

> 2) Static Analysis stuff:

I think it's worth mentioning that I also tried to integrate the Error
Prone analyzer (http://errorprone.info/) a while ago as part of
CASSANDRA-13175. Eventually I dropped the ball there due to some
classpath issues, but maybe that can be fix or worked around.

Having a service like lgtm.com is nice, but ideally I'd like to have a
solution that does integrate with circle CI and clearly indicates new
issues for a proposed patch. Or, at least, have a one-click way to check
new code that is about to get committed using an external service.
Easily recognizing issues for new code seems to be more valuable to me,
instead of having a long report for your complete code base that you
have to filter manually.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Somewhat Weekly Cassandra Dev Wrapup

2017-11-01 Thread Jeff Jirsa
Ah, I remember that now. Blocked by a guava bug? 4.0 seems like a good time to 
upgrade guava.

-- 
Jeff Jirsa


> On Nov 1, 2017, at 2:49 AM, Stefan Podkowinski  wrote:
> 
> 
>> 2) Static Analysis stuff:
> 
> I think it's worth mentioning that I also tried to integrate the Error
> Prone analyzer (http://errorprone.info/) a while ago as part of
> CASSANDRA-13175. Eventually I dropped the ball there due to some
> classpath issues, but maybe that can be fix or worked around.
> 
> Having a service like lgtm.com is nice, but ideally I'd like to have a
> solution that does integrate with circle CI and clearly indicates new
> issues for a proposed patch. Or, at least, have a one-click way to check
> new code that is about to get committed using an external service.
> Easily recognizing issues for new code seems to be more valuable to me,
> instead of having a long report for your complete code base that you
> have to filter manually.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Somewhat Weekly Cassandra Dev Wrapup

2017-11-01 Thread Salih Gedik
Hi,

As an undergrad student I actually question the output of static analysis 
tools. Are you guys actively using it or do you find projects like Sonar 
efficient in such open source projects? Last time I heard that FindBugs are no 
longer maintained because the code was hard to maintain. For instance I checked 
one of the “Potential Index Out of bounds” pointed by LGTM. This is listed as a 
potential one. What is wrong with the snippet 
below?(https://lgtm.com/projects/g/apache/cassandra/alerts/?mode=tree&severity=error&rule=2049320662
 
)

 void forEach(HistogramDataConsumer consumer) throws E
{
for (int i = 0; i < map.length; i += 2)
{
if (map[i] != -1)
{
consumer.consume(map[i], map[i + 1]);
}
}
}

Thanks a lot!




> On 1 Nov 2017, at 12:53, Jeff Jirsa  wrote:
> 
> Ah, I remember that now. Blocked by a guava bug? 4.0 seems like a good time 
> to upgrade guava.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Nov 1, 2017, at 2:49 AM, Stefan Podkowinski  wrote:
>> 
>> 
>>> 2) Static Analysis stuff:
>> 
>> I think it's worth mentioning that I also tried to integrate the Error
>> Prone analyzer (http://errorprone.info/) a while ago as part of
>> CASSANDRA-13175. Eventually I dropped the ball there due to some
>> classpath issues, but maybe that can be fix or worked around.
>> 
>> Having a service like lgtm.com is nice, but ideally I'd like to have a
>> solution that does integrate with circle CI and clearly indicates new
>> issues for a proposed patch. Or, at least, have a one-click way to check
>> new code that is about to get committed using an external service.
>> Easily recognizing issues for new code seems to be more valuable to me,
>> instead of having a long report for your complete code base that you
>> have to filter manually.
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 



Re: Somewhat Weekly Cassandra Dev Wrapup

2017-11-01 Thread Jeff Jirsa
Good questions. Right now we're not actively using it (at least not
publicly, as far as I know, individual contributors may be using it or
sonar or something else).

For the specific warning (index out of bounds) you point out below, if
map.length was odd, then the consumer.consume(map[i],map[i+1]) could
reference map.length+1, which is invalid. This can only happen if the
length is odd, since we're incrementing i += 2). However, in our case, map
is initialized to capacity * 2 * 2, so it'll always be even, so this
potential bug can't ever happen. We could be a bit more defensive (which
would probably hint to lgtm that it's impossible) by stopping iteration at
map.length -1 (which won't change the behavior), or we can just ignore it -
so far we've just ignored it.

- Jeff

On Wed, Nov 1, 2017 at 5:56 AM, Salih Gedik  wrote:

> Hi,
>
> As an undergrad student I actually question the output of static analysis
> tools. Are you guys actively using it or do you find projects like Sonar
> efficient in such open source projects? Last time I heard that FindBugs are
> no longer maintained because the code was hard to maintain. For instance I
> checked one of the “Potential Index Out of bounds” pointed by LGTM. This is
> listed as a potential one. What is wrong with the snippet below?(
> https://lgtm.com/projects/g/apache/cassandra/alerts/?mode=tree&severity=
> error&rule=2049320662  apache/cassandra/alerts/?mode=tree&severity=error&rule=2049320662>)
>
>  void forEach(HistogramDataConsumer consumer)
> throws E
> {
> for (int i = 0; i < map.length; i += 2)
> {
> if (map[i] != -1)
> {
> consumer.consume(map[i], map[i + 1]);
> }
> }
> }
>
> Thanks a lot!
>
>
>
>
> > On 1 Nov 2017, at 12:53, Jeff Jirsa  wrote:
> >
> > Ah, I remember that now. Blocked by a guava bug? 4.0 seems like a good
> time to upgrade guava.
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Nov 1, 2017, at 2:49 AM, Stefan Podkowinski  wrote:
> >>
> >>
> >>> 2) Static Analysis stuff:
> >>
> >> I think it's worth mentioning that I also tried to integrate the Error
> >> Prone analyzer (http://errorprone.info/) a while ago as part of
> >> CASSANDRA-13175. Eventually I dropped the ball there due to some
> >> classpath issues, but maybe that can be fix or worked around.
> >>
> >> Having a service like lgtm.com is nice, but ideally I'd like to have a
> >> solution that does integrate with circle CI and clearly indicates new
> >> issues for a proposed patch. Or, at least, have a one-click way to check
> >> new code that is about to get committed using an external service.
> >> Easily recognizing issues for new code seems to be more valuable to me,
> >> instead of having a long report for your complete code base that you
> >> have to filter manually.
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>


Re: Making CommitLog pluggable

2017-11-01 Thread 大平怜
Hi Ariel,

CommitLogSegment assumes commit log files stored on a regular file system.
Our CAPI Flash system bypasses OS and directly accesses flash,
so we cannot use the current framework of CommitLogSegment as it is.
Intel's SPDK also bypasses a file system, so we think this kind of
requirement
is not uncommon.

It would not be easy to reuse AbstractCommitLogSegmentManager, either,
because the archiving and synchronization logics have to be decoupled.
It would require major rework, and we don't think we should affect
the existing implementation so much.

We do not change any existing format of CommitLog.  Our plugin will use
its own format, as it must manage commit logs on the 4KB-block-oriented
address spaces of flash devices.


Regards,
Rei Odaira


2017-10-31 15:38 GMT-05:00 Ariel Weisberg :

> Hi,
>
> There are pluggable elements to the commit log such as those used to
> support mmap or compressed.
>
> Can you describe at a high level what a new implementation would look
> like and why it can't be a mode of the existing implementation?
>
> You are not proposing changing the format correct?
>
> Regards,
> Ariel
>
> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > Hello,
> >
> > We are developing a Cassandra plugin to store CommitLog on our
> > low-latency
> > Flash device (CAPI-Flash).  To do that, the original CommitLog interface
> > must be changed to allow plugins.  Anyone has any thoughts about it?  We
> > have our codebase ready, but we think we should start with high-level
> > discussion.
> >
> > The runtime overhead will be minimal.  The only overhead will be changing
> > method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> > etc.
> > into interface invocations.
> >
> > Synching to CommitLog is one of the performance bottlenecks in Cassandra
> > especially with batch commit.  I think the pluggable CommitLog will allow
> > other interesting alternatives, such as one using SPDK.  Appreciate any
> > comments.
> >
> >
> > Regards,
> > Rei Odaira
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Making CommitLog pluggable

2017-11-01 Thread Michael Kjellman
Rei:

One thing that comes up when these type of conversations occur is how the 
project can test hardware dependent code. In the case of the PPC64 stuff, 
hardware actually got donated to the ASF so Jenkins runs could be done to check 
that things work. Any thoughts on this aspect? Might be a bit pre-mature, but I 
thought I'd at least mention it... On the flip side: if CommitLog becomes 
pluggable enough, shipping an implementation compatible with the hardware out 
of tree might also be viable too.

best,
kjellman

> On Nov 1, 2017, at 2:25 PM, 大平怜  wrote:
> 
> Hi Ariel,
> 
> CommitLogSegment assumes commit log files stored on a regular file system.
> Our CAPI Flash system bypasses OS and directly accesses flash,
> so we cannot use the current framework of CommitLogSegment as it is.
> Intel's SPDK also bypasses a file system, so we think this kind of
> requirement
> is not uncommon.
> 
> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> because the archiving and synchronization logics have to be decoupled.
> It would require major rework, and we don't think we should affect
> the existing implementation so much.
> 
> We do not change any existing format of CommitLog.  Our plugin will use
> its own format, as it must manage commit logs on the 4KB-block-oriented
> address spaces of flash devices.
> 
> 
> Regards,
> Rei Odaira
> 
> 
> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg :
> 
>> Hi,
>> 
>> There are pluggable elements to the commit log such as those used to
>> support mmap or compressed.
>> 
>> Can you describe at a high level what a new implementation would look
>> like and why it can't be a mode of the existing implementation?
>> 
>> You are not proposing changing the format correct?
>> 
>> Regards,
>> Ariel
>> 
>> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
>>> Hello,
>>> 
>>> We are developing a Cassandra plugin to store CommitLog on our
>>> low-latency
>>> Flash device (CAPI-Flash).  To do that, the original CommitLog interface
>>> must be changed to allow plugins.  Anyone has any thoughts about it?  We
>>> have our codebase ready, but we think we should start with high-level
>>> discussion.
>>> 
>>> The runtime overhead will be minimal.  The only overhead will be changing
>>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
>>> etc.
>>> into interface invocations.
>>> 
>>> Synching to CommitLog is one of the performance bottlenecks in Cassandra
>>> especially with batch commit.  I think the pluggable CommitLog will allow
>>> other interesting alternatives, such as one using SPDK.  Appreciate any
>>> comments.
>>> 
>>> 
>>> Regards,
>>> Rei Odaira
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 



Re: Making CommitLog pluggable

2017-11-01 Thread 大平怜
Hi Michael,

Yes, testing is always a problem, and that is exactly why we would like to
release
our code as a plugin, outside of the main source tree, so that the project
won't
need to test the hardware-dependent code.
The pluggable CommitLog will allow this approach.

Actually, we have already released another plugin for CAPI-Flash-based
RowCache,
which takes advantage of the pluggable RowCache mechanism.
https://github.com/ppc64le/capi-rowcache
We would just like to repeat this approach in CommitLog.


Thanks,
Rei Odaira


2017-11-01 16:30 GMT-05:00 Michael Kjellman :

> Rei:
>
> One thing that comes up when these type of conversations occur is how the
> project can test hardware dependent code. In the case of the PPC64 stuff,
> hardware actually got donated to the ASF so Jenkins runs could be done to
> check that things work. Any thoughts on this aspect? Might be a bit
> pre-mature, but I thought I'd at least mention it... On the flip side: if
> CommitLog becomes pluggable enough, shipping an implementation compatible
> with the hardware out of tree might also be viable too.
>
> best,
> kjellman
>
> > On Nov 1, 2017, at 2:25 PM, 大平怜  wrote:
> >
> > Hi Ariel,
> >
> > CommitLogSegment assumes commit log files stored on a regular file
> system.
> > Our CAPI Flash system bypasses OS and directly accesses flash,
> > so we cannot use the current framework of CommitLogSegment as it is.
> > Intel's SPDK also bypasses a file system, so we think this kind of
> > requirement
> > is not uncommon.
> >
> > It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> > because the archiving and synchronization logics have to be decoupled.
> > It would require major rework, and we don't think we should affect
> > the existing implementation so much.
> >
> > We do not change any existing format of CommitLog.  Our plugin will use
> > its own format, as it must manage commit logs on the 4KB-block-oriented
> > address spaces of flash devices.
> >
> >
> > Regards,
> > Rei Odaira
> >
> >
> > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg :
> >
> >> Hi,
> >>
> >> There are pluggable elements to the commit log such as those used to
> >> support mmap or compressed.
> >>
> >> Can you describe at a high level what a new implementation would look
> >> like and why it can't be a mode of the existing implementation?
> >>
> >> You are not proposing changing the format correct?
> >>
> >> Regards,
> >> Ariel
> >>
> >> On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> >>> Hello,
> >>>
> >>> We are developing a Cassandra plugin to store CommitLog on our
> >>> low-latency
> >>> Flash device (CAPI-Flash).  To do that, the original CommitLog
> interface
> >>> must be changed to allow plugins.  Anyone has any thoughts about it?
> We
> >>> have our codebase ready, but we think we should start with high-level
> >>> discussion.
> >>>
> >>> The runtime overhead will be minimal.  The only overhead will be
> changing
> >>> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> >>> etc.
> >>> into interface invocations.
> >>>
> >>> Synching to CommitLog is one of the performance bottlenecks in
> Cassandra
> >>> especially with batch commit.  I think the pluggable CommitLog will
> allow
> >>> other interesting alternatives, such as one using SPDK.  Appreciate any
> >>> comments.
> >>>
> >>>
> >>> Regards,
> >>> Rei Odaira
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>
>


Re: Making CommitLog pluggable

2017-11-01 Thread Michael Kjellman
Awesome!! You're two steps ahead ;)

Not sure if you're allowed to share, but can you highlight any details on 
endurance and performance? Are the pages 4kb or 16kb? How many writes do you 
expect to handle over a 1 year window of the device? I assume because you're 
directly accessing the hardware as a block device there are different rules in 
regards to fsync and how things are flushed? Any power loss protection features 
etc? If you write a commit log segment that's like 20 bytes (for example), will 
you post-pad the entire thing internally and still need to write 4kb (or 
whatever the physical page size is)?

Thanks!

best,
kjellman

> On Nov 1, 2017, at 2:40 PM, 大平怜  wrote:
> 
> Hi Michael,
> 
> Yes, testing is always a problem, and that is exactly why we would like to
> release
> our code as a plugin, outside of the main source tree, so that the project
> won't
> need to test the hardware-dependent code.
> The pluggable CommitLog will allow this approach.
> 
> Actually, we have already released another plugin for CAPI-Flash-based
> RowCache,
> which takes advantage of the pluggable RowCache mechanism.
> https://github.com/ppc64le/capi-rowcache
> We would just like to repeat this approach in CommitLog.
> 
> 
> Thanks,
> Rei Odaira
> 
> 
> 2017-11-01 16:30 GMT-05:00 Michael Kjellman :
> 
>> Rei:
>> 
>> One thing that comes up when these type of conversations occur is how the
>> project can test hardware dependent code. In the case of the PPC64 stuff,
>> hardware actually got donated to the ASF so Jenkins runs could be done to
>> check that things work. Any thoughts on this aspect? Might be a bit
>> pre-mature, but I thought I'd at least mention it... On the flip side: if
>> CommitLog becomes pluggable enough, shipping an implementation compatible
>> with the hardware out of tree might also be viable too.
>> 
>> best,
>> kjellman
>> 
>>> On Nov 1, 2017, at 2:25 PM, 大平怜  wrote:
>>> 
>>> Hi Ariel,
>>> 
>>> CommitLogSegment assumes commit log files stored on a regular file
>> system.
>>> Our CAPI Flash system bypasses OS and directly accesses flash,
>>> so we cannot use the current framework of CommitLogSegment as it is.
>>> Intel's SPDK also bypasses a file system, so we think this kind of
>>> requirement
>>> is not uncommon.
>>> 
>>> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
>>> because the archiving and synchronization logics have to be decoupled.
>>> It would require major rework, and we don't think we should affect
>>> the existing implementation so much.
>>> 
>>> We do not change any existing format of CommitLog.  Our plugin will use
>>> its own format, as it must manage commit logs on the 4KB-block-oriented
>>> address spaces of flash devices.
>>> 
>>> 
>>> Regards,
>>> Rei Odaira
>>> 
>>> 
>>> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg :
>>> 
 Hi,
 
 There are pluggable elements to the commit log such as those used to
 support mmap or compressed.
 
 Can you describe at a high level what a new implementation would look
 like and why it can't be a mode of the existing implementation?
 
 You are not proposing changing the format correct?
 
 Regards,
 Ariel
 
 On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> Hello,
> 
> We are developing a Cassandra plugin to store CommitLog on our
> low-latency
> Flash device (CAPI-Flash).  To do that, the original CommitLog
>> interface
> must be changed to allow plugins.  Anyone has any thoughts about it?
>> We
> have our codebase ready, but we think we should start with high-level
> discussion.
> 
> The runtime overhead will be minimal.  The only overhead will be
>> changing
> method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> etc.
> into interface invocations.
> 
> Synching to CommitLog is one of the performance bottlenecks in
>> Cassandra
> especially with batch commit.  I think the pluggable CommitLog will
>> allow
> other interesting alternatives, such as one using SPDK.  Appreciate any
> comments.
> 
> 
> Regards,
> Rei Odaira
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
 
>> 
>> 



Re: Making CommitLog pluggable

2017-11-01 Thread Ariel Weisberg
Hi,

OK. It makes sense that most of the existing plumbing is not applicable
since it operates on a filesystem.

How does replay work? Presumably you will need to refactor
CommitLogReplayer as well?

I think the best way for us to decide whether it's something we want in
tree is to see a patch. You would need to do this even if it doesn't
make it in tree and you end up having to deploy a patched build.

Pluggability is a little bit of a touchy subject because we don't want
to directly or indirectly become responsible for interfaces to out of
tree implementations. I don't know if there is consensus around this,
but I think even if we made the commit log pluggable it would be with
the understanding that we may change the API even in bug fix releases.

Down the line where this becomes tricky is unmaintained out of tree
implementations that people depend on being broken due to interface
changes and then no one being around to fix them. People who depend on
the out of tree implementation have no one to complain to but us. This
becomes even more likely when the maintainers aren't using the latest
version of C* and are busy with other things.

You are characterizing the API as being just a few methods on CommitLog
but that isn't true. 

These are the imports for CommitLogReplayer

import org.apache.cassandra.concurrent.Stage;
import org.apache.cassandra.concurrent.StageManager;
import org.apache.cassandra.config.CFMetaData;
import org.apache.cassandra.config.Schema;
import org.apache.cassandra.db.*;
import org.apache.cassandra.io.util.FastByteArrayInputStream;
import org.apache.cassandra.io.util.FileUtils;
import org.apache.cassandra.io.util.RandomAccessReader;
import org.apache.cassandra.utils.*;

And these are the imports for CommitLog

import org.apache.cassandra.config.Config;
import org.apache.cassandra.config.DatabaseDescriptor;
import org.apache.cassandra.db.*;
import org.apache.cassandra.io.FSWriteError;
import org.apache.cassandra.io.sstable.SSTableDeletingTask;
import org.apache.cassandra.io.util.DataOutputByteBuffer;
import org.apache.cassandra.metrics.CommitLogMetrics;
import org.apache.cassandra.net.MessagingService;
import org.apache.cassandra.service.StorageService;
import org.apache.cassandra.utils.JVMStabilityInspector;

If we change any code that changes a line in CommitLog or
CommitLogReplayer in a bug fix release it's probably going to break your
plugin JAR. Anyone running it in production will now have to fix it and
recompile or be unable to get bug fixes.

Regards,
Ariel
On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
> Hi Ariel,
> 
> CommitLogSegment assumes commit log files stored on a regular file
> system.
> Our CAPI Flash system bypasses OS and directly accesses flash,
> so we cannot use the current framework of CommitLogSegment as it is.
> Intel's SPDK also bypasses a file system, so we think this kind of
> requirement
> is not uncommon.
> 
> It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> because the archiving and synchronization logics have to be decoupled.
> It would require major rework, and we don't think we should affect
> the existing implementation so much.
> 
> We do not change any existing format of CommitLog.  Our plugin will use
> its own format, as it must manage commit logs on the 4KB-block-oriented
> address spaces of flash devices.
> 
> 
> Regards,
> Rei Odaira
> 
> 
> 2017-10-31 15:38 GMT-05:00 Ariel Weisberg :
> 
> > Hi,
> >
> > There are pluggable elements to the commit log such as those used to
> > support mmap or compressed.
> >
> > Can you describe at a high level what a new implementation would look
> > like and why it can't be a mode of the existing implementation?
> >
> > You are not proposing changing the format correct?
> >
> > Regards,
> > Ariel
> >
> > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > > Hello,
> > >
> > > We are developing a Cassandra plugin to store CommitLog on our
> > > low-latency
> > > Flash device (CAPI-Flash).  To do that, the original CommitLog interface
> > > must be changed to allow plugins.  Anyone has any thoughts about it?  We
> > > have our codebase ready, but we think we should start with high-level
> > > discussion.
> > >
> > > The runtime overhead will be minimal.  The only overhead will be changing
> > > method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> > > etc.
> > > into interface invocations.
> > >
> > > Synching to CommitLog is one of the performance bottlenecks in Cassandra
> > > especially with batch commit.  I think the pluggable CommitLog will allow
> > > other interesting alternatives, such as one using SPDK.  Appreciate any
> > > comments.
> > >
> > >
> > > Regards,
> > > Rei Odaira
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To

Re: Making CommitLog pluggable

2017-11-01 Thread Ariel Weisberg
Hi,

Just so I don't seem too negative, what I would really like to see is an
in tree implementation. The real challenge there is that the hardware is
not widely available. If it were something you could get in GCE or AWS
or at least get via an emulator that would be a different story.

Ariel

On Wed, Nov 1, 2017, at 06:46 PM, Ariel Weisberg wrote:
> Hi,
> 
> OK. It makes sense that most of the existing plumbing is not applicable
> since it operates on a filesystem.
> 
> How does replay work? Presumably you will need to refactor
> CommitLogReplayer as well?
> 
> I think the best way for us to decide whether it's something we want in
> tree is to see a patch. You would need to do this even if it doesn't
> make it in tree and you end up having to deploy a patched build.
> 
> Pluggability is a little bit of a touchy subject because we don't want
> to directly or indirectly become responsible for interfaces to out of
> tree implementations. I don't know if there is consensus around this,
> but I think even if we made the commit log pluggable it would be with
> the understanding that we may change the API even in bug fix releases.
> 
> Down the line where this becomes tricky is unmaintained out of tree
> implementations that people depend on being broken due to interface
> changes and then no one being around to fix them. People who depend on
> the out of tree implementation have no one to complain to but us. This
> becomes even more likely when the maintainers aren't using the latest
> version of C* and are busy with other things.
> 
> You are characterizing the API as being just a few methods on CommitLog
> but that isn't true. 
> 
> These are the imports for CommitLogReplayer
> 
> import org.apache.cassandra.concurrent.Stage;
> import org.apache.cassandra.concurrent.StageManager;
> import org.apache.cassandra.config.CFMetaData;
> import org.apache.cassandra.config.Schema;
> import org.apache.cassandra.db.*;
> import org.apache.cassandra.io.util.FastByteArrayInputStream;
> import org.apache.cassandra.io.util.FileUtils;
> import org.apache.cassandra.io.util.RandomAccessReader;
> import org.apache.cassandra.utils.*;
> 
> And these are the imports for CommitLog
> 
> import org.apache.cassandra.config.Config;
> import org.apache.cassandra.config.DatabaseDescriptor;
> import org.apache.cassandra.db.*;
> import org.apache.cassandra.io.FSWriteError;
> import org.apache.cassandra.io.sstable.SSTableDeletingTask;
> import org.apache.cassandra.io.util.DataOutputByteBuffer;
> import org.apache.cassandra.metrics.CommitLogMetrics;
> import org.apache.cassandra.net.MessagingService;
> import org.apache.cassandra.service.StorageService;
> import org.apache.cassandra.utils.JVMStabilityInspector;
> 
> If we change any code that changes a line in CommitLog or
> CommitLogReplayer in a bug fix release it's probably going to break your
> plugin JAR. Anyone running it in production will now have to fix it and
> recompile or be unable to get bug fixes.
> 
> Regards,
> Ariel
> On Wed, Nov 1, 2017, at 05:25 PM, 大平怜 wrote:
> > Hi Ariel,
> > 
> > CommitLogSegment assumes commit log files stored on a regular file
> > system.
> > Our CAPI Flash system bypasses OS and directly accesses flash,
> > so we cannot use the current framework of CommitLogSegment as it is.
> > Intel's SPDK also bypasses a file system, so we think this kind of
> > requirement
> > is not uncommon.
> > 
> > It would not be easy to reuse AbstractCommitLogSegmentManager, either,
> > because the archiving and synchronization logics have to be decoupled.
> > It would require major rework, and we don't think we should affect
> > the existing implementation so much.
> > 
> > We do not change any existing format of CommitLog.  Our plugin will use
> > its own format, as it must manage commit logs on the 4KB-block-oriented
> > address spaces of flash devices.
> > 
> > 
> > Regards,
> > Rei Odaira
> > 
> > 
> > 2017-10-31 15:38 GMT-05:00 Ariel Weisberg :
> > 
> > > Hi,
> > >
> > > There are pluggable elements to the commit log such as those used to
> > > support mmap or compressed.
> > >
> > > Can you describe at a high level what a new implementation would look
> > > like and why it can't be a mode of the existing implementation?
> > >
> > > You are not proposing changing the format correct?
> > >
> > > Regards,
> > > Ariel
> > >
> > > On Tue, Oct 31, 2017, at 04:09 PM, 大平怜 wrote:
> > > > Hello,
> > > >
> > > > We are developing a Cassandra plugin to store CommitLog on our
> > > > low-latency
> > > > Flash device (CAPI-Flash).  To do that, the original CommitLog interface
> > > > must be changed to allow plugins.  Anyone has any thoughts about it?  We
> > > > have our codebase ready, but we think we should start with high-level
> > > > discussion.
> > > >
> > > > The runtime overhead will be minimal.  The only overhead will be 
> > > > changing
> > > > method invocations to CommitLog#add(), CommitLog#getCurrentPosition(),
> > > > etc.
> > > > into interface invocations.
> > >