date:20181019

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Sylvain Lebresne

Fwiw, as much as I agree this is a change worth doing in general, I do am
-0 for 4.0. Both the "compact sequencing" and the change of default really.
We're closing on 2 months within the freeze, and for me a freeze do include
not changing defaults, because changing default ideally imply a decent
amount of analysis/benchmark of the consequence of that change[1] and that
doesn't enter in my definition of a freeze.

[1]: to be extra clear, I'm not saying we've always done this, far from it.
But I hope we can all agree we were wrong to no do it when we didn't and
should strive to improve, not repeat past mistakes.
--
Sylvain


On Thu, Oct 18, 2018 at 8:55 PM Ariel Weisberg  wrote:

> Hi,
>
> For those who were asking about the performance impact of block size on
> compression I wrote a microbenchmark.
>
> https://pastebin.com/RHDNLGdC
>
>  [java] Benchmark   Mode
> Cnt  Score  Error  Units
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt
>  15  331190055.685 ±  8079758.044  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt
>  15  353024925.655 ±  7980400.003  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt
>  15  365664477.654 ± 10083336.038  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt
>  15  305518114.172 ± 11043705.883  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt
>  15  688369529.911 ± 25620873.933  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt
>  15  703635848.895 ±  5296941.704  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt
>  15  695537044.676 ± 17400763.731  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt
>  15  727725713.128 ±  4252436.331  ops/s
>
> To summarize, compression is 8.5% slower and decompression is 1% faster.
> This is measuring the impact on compression/decompression not the huge
> impact that would occur if we decompressed data we don't need less often.
>
> I didn't test decompression of Snappy and LZ4 high, but I did test
> compression.
>
> Snappy:
>  [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
> 2  196574766.116  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
> 2  198538643.844  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
> 2  194600497.613  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
> 2  186040175.059  ops/s
>
> LZ4 high compressor:
>  [java] CompactIntegerSequenceBench.bench16k  thrpt2
> 20822947.578  ops/s
>  [java] CompactIntegerSequenceBench.bench32k  thrpt2
> 12037342.253  ops/s
>  [java] CompactIntegerSequenceBench.bench64k  thrpt2
>  6782534.469  ops/s
>  [java] CompactIntegerSequenceBench.bench8k   thrpt2
> 32254619.594  ops/s
>
> LZ4 high is the one instance where block size mattered a lot. It's a bit
> suspicious really when you look at the ratio of performance to block size
> being close to 1:1. I couldn't spot a bug in the benchmark though.
>
> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
>
> Chunk size 8192, ratio 0.709473
> Chunk size 16384, ratio 0.667236
> Chunk size 32768, ratio 0.634735
> Chunk size 65536, ratio 0.607208
>
> By way of comparison I also ran deflate with maximum compression:
>
> Chunk size 8192, ratio 0.426434
> Chunk size 16384, ratio 0.402423
> Chunk size 32768, ratio 0.381627
> Chunk size 65536, ratio 0.364865
>
> Ariel
>
> On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> > FWIW, I’m not -0, just think that long after the freeze date a change
> > like this needs a strong mandate from the community.  I think the change
> > is a good one.
> >
> >
> >
> >
> >
> > > On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> > >
> > > Hi,
> > >
> > > It's really not appreciably slower compared to the decompression we
> are going to do which is going to take several microseconds. Decompression
> is also going to be faster because we are going to do less unnecessary
> decompression and the decompression itself may be faster since it may fit
> in a higher level cache better. I ran a microbenchmark comparing them.
> > >
> > >
> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> > >
> > > Fetching a long from memory:   56 nanoseconds
> > > Compact integer sequence   :   80 nanoseconds
> > > Summing integer sequence   :  165 nanoseconds
> > >
> > > Currently we have one +1 from Kurt to change the representation and
> possibly a -0 from Benedict. That's not really enough to make an exception
> to the code freeze. If you want it to happen (or not) you n

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Jeff Jirsa

Agree with Sylvain (and I think Benedict) - there’s no compelling reason to 
violate the freeze here. We’ve had the wrong default for years - add a note to 
the docs that we’ll be changing it in the future, but let’s not violate the 
freeze now.

-- 
Jeff Jirsa


> On Oct 19, 2018, at 10:06 AM, Sylvain Lebresne  wrote:
> 
> Fwiw, as much as I agree this is a change worth doing in general, I do am
> -0 for 4.0. Both the "compact sequencing" and the change of default really.
> We're closing on 2 months within the freeze, and for me a freeze do include
> not changing defaults, because changing default ideally imply a decent
> amount of analysis/benchmark of the consequence of that change[1] and that
> doesn't enter in my definition of a freeze.
> 
> [1]: to be extra clear, I'm not saying we've always done this, far from it.
> But I hope we can all agree we were wrong to no do it when we didn't and
> should strive to improve, not repeat past mistakes.
> --
> Sylvain
> 
> 
>> On Thu, Oct 18, 2018 at 8:55 PM Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> For those who were asking about the performance impact of block size on
>> compression I wrote a microbenchmark.
>> 
>> https://pastebin.com/RHDNLGdC
>> 
>> [java] Benchmark   Mode
>> Cnt  Score  Error  Units
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt
>> 15  331190055.685 ±  8079758.044  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt
>> 15  353024925.655 ±  7980400.003  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt
>> 15  365664477.654 ± 10083336.038  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt
>> 15  305518114.172 ± 11043705.883  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt
>> 15  688369529.911 ± 25620873.933  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt
>> 15  703635848.895 ±  5296941.704  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt
>> 15  695537044.676 ± 17400763.731  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt
>> 15  727725713.128 ±  4252436.331  ops/s
>> 
>> To summarize, compression is 8.5% slower and decompression is 1% faster.
>> This is measuring the impact on compression/decompression not the huge
>> impact that would occur if we decompressed data we don't need less often.
>> 
>> I didn't test decompression of Snappy and LZ4 high, but I did test
>> compression.
>> 
>> Snappy:
>> [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
>> 2  196574766.116  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
>> 2  198538643.844  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
>> 2  194600497.613  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
>> 2  186040175.059  ops/s
>> 
>> LZ4 high compressor:
>> [java] CompactIntegerSequenceBench.bench16k thrpt2
>> 20822947.578  ops/s
>> [java] CompactIntegerSequenceBench.bench32k thrpt2
>> 12037342.253  ops/s
>> [java] CompactIntegerSequenceBench.bench64k  thrpt2
>> 6782534.469  ops/s
>> [java] CompactIntegerSequenceBench.bench8k   thrpt2
>> 32254619.594  ops/s
>> 
>> LZ4 high is the one instance where block size mattered a lot. It's a bit
>> suspicious really when you look at the ratio of performance to block size
>> being close to 1:1. I couldn't spot a bug in the benchmark though.
>> 
>> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
>> 
>> Chunk size 8192, ratio 0.709473
>> Chunk size 16384, ratio 0.667236
>> Chunk size 32768, ratio 0.634735
>> Chunk size 65536, ratio 0.607208
>> 
>> By way of comparison I also ran deflate with maximum compression:
>> 
>> Chunk size 8192, ratio 0.426434
>> Chunk size 16384, ratio 0.402423
>> Chunk size 32768, ratio 0.381627
>> Chunk size 65536, ratio 0.364865
>> 
>> Ariel
>> 
>>> On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
>>> FWIW, I’m not -0, just think that long after the freeze date a change
>>> like this needs a strong mandate from the community.  I think the change
>>> is a good one.
>>> 
>>> 
>>> 
>>> 
>>> 
 On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
 
 Hi,
 
 It's really not appreciably slower compared to the decompression we
>> are going to do which is going to take several microseconds. Decompression
>> is also going to be faster because we are going to do less unnecessary
>> decompression and the decompression itself may be faster since it may fit
>> in a higher level cache better. I ran a microbenchmark comparing them.
 
 
>> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:

Re: Built in trigger: double-write for app migration

2018-10-19 Thread Antonis Papaioannou

It reminds me of “shadow writes” described in [1].
During data migration the coordinator forwards  a copy of any write request 
regarding tokens that are being transferred to the new node.

[1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,  
https://ieeexplore.ieee.org/document/8069080


> On 18 Oct 2018, at 18:53, Carl Mueller  
> wrote:
> 
> tl;dr: a generic trigger on TABLES that will mirror all writes to
> facilitate data migrations between clusters or systems. What is necessary
> to ensure full write mirroring/coherency?
> 
> When cassandra clusters have several "apps" aka keyspaces serving
> applications colocated on them, but the app/keyspace bandwidth and size
> demands begin impacting other keyspaces/apps, then one strategy is to
> migrate the keyspace to its own dedicated cluster.
> 
> With backups/sstableloading, this will entail a delay and therefore a
> "coherency" shortfall between the clusters. So typically one would employ a
> "double write, read once":
> 
> - all updates are mirrored to both clusters
> - writes come from the current most coherent.
> 
> Often two sstable loads are done:
> 
> 1) first load
> 2) turn on double writes/write mirroring
> 3) a second load is done to finalize coherency
> 4) switch the app to point to the new cluster now that it is coherent
> 
> The double writes and read is the sticking point. We could do it at the app
> layer, but if the app wasn't written with that, it is a lot of testing and
> customization specific to the framework.
> 
> We could theoretically do some sort of proxying of the java-driver somehow,
> but all the async structures and complex interfaces/apis would be difficult
> to proxy. Maybe there is a lower level in the java-driver that is possible.
> This also would only apply to the java-driver, and not
> python/go/javascript/other drivers.
> 
> Finally, I suppose we could do a trigger on the tables. It would be really
> nice if we could add to the cassandra toolbox the basics of a write
> mirroring trigger that could be activated "fairly easily"... now I know
> there are the complexities of inter-cluster access, and if we are even
> using cassandra as the target mirror system (for example there is an
> article on triggers write-mirroring to kafka:
> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> 
> And this starts to get into the complexities of hinted handoff as well. But
> fundamentally this seems something that would be a very nice feature
> (especially when you NEED it) to have in the core of cassandra.
> 
> Finally, is the mutation hook in triggers sufficient to track all incoming
> mutations (outside of "shudder" other triggers generating data)

Re: Proposing an Apache Cassandra Management process

2018-10-19 Thread Mick Semb Wever



>  Can you share the link to cwiki if you have started it ?
> 

I haven't. 
But I'll try to put together a strawman proposal for the doc(s) over the 
weekend. 

Regards, 
Mick 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Benedict Elliott Smith

The change of default property doesn’t seem to violate the freeze?  The 
predominant phrased used in that thread was 'feature freeze'.  A lot of people 
are now interpreting it more broadly, so perhaps we need to revisit, but that’s 
probably a separate discussion?

The current default is really bad for most users, so I’m +1 changing it.  
Especially as the last time this topic was raised was (iirc) around the 3.0 
freeze.  We decided not to change anything for similar reasons, and haven't 
revisited it since.


> On 19 Oct 2018, at 09:25, Jeff Jirsa  wrote:
> 
> Agree with Sylvain (and I think Benedict) - there’s no compelling reason to 
> violate the freeze here. We’ve had the wrong default for years - add a note 
> to the docs that we’ll be changing it in the future, but let’s not violate 
> the freeze now.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Oct 19, 2018, at 10:06 AM, Sylvain Lebresne  wrote:
>> 
>> Fwiw, as much as I agree this is a change worth doing in general, I do am
>> -0 for 4.0. Both the "compact sequencing" and the change of default really.
>> We're closing on 2 months within the freeze, and for me a freeze do include
>> not changing defaults, because changing default ideally imply a decent
>> amount of analysis/benchmark of the consequence of that change[1] and that
>> doesn't enter in my definition of a freeze.
>> 
>> [1]: to be extra clear, I'm not saying we've always done this, far from it.
>> But I hope we can all agree we were wrong to no do it when we didn't and
>> should strive to improve, not repeat past mistakes.
>> --
>> Sylvain
>> 
>> 
>>> On Thu, Oct 18, 2018 at 8:55 PM Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> For those who were asking about the performance impact of block size on
>>> compression I wrote a microbenchmark.
>>> 
>>> https://pastebin.com/RHDNLGdC
>>> 
>>>[java] Benchmark   Mode
>>> Cnt  Score  Error  Units
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt
>>> 15  331190055.685 ±  8079758.044  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt
>>> 15  353024925.655 ±  7980400.003  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt
>>> 15  365664477.654 ± 10083336.038  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt
>>> 15  305518114.172 ± 11043705.883  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt
>>> 15  688369529.911 ± 25620873.933  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt
>>> 15  703635848.895 ±  5296941.704  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt
>>> 15  695537044.676 ± 17400763.731  ops/s
>>>[java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt
>>> 15  727725713.128 ±  4252436.331  ops/s
>>> 
>>> To summarize, compression is 8.5% slower and decompression is 1% faster.
>>> This is measuring the impact on compression/decompression not the huge
>>> impact that would occur if we decompressed data we don't need less often.
>>> 
>>> I didn't test decompression of Snappy and LZ4 high, but I did test
>>> compression.
>>> 
>>> Snappy:
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
>>> 2  196574766.116  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
>>> 2  198538643.844  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
>>> 2  194600497.613  ops/s
>>>[java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
>>> 2  186040175.059  ops/s
>>> 
>>> LZ4 high compressor:
>>>[java] CompactIntegerSequenceBench.bench16k thrpt2
>>> 20822947.578  ops/s
>>>[java] CompactIntegerSequenceBench.bench32k thrpt2
>>> 12037342.253  ops/s
>>>[java] CompactIntegerSequenceBench.bench64k  thrpt2
>>> 6782534.469  ops/s
>>>[java] CompactIntegerSequenceBench.bench8k   thrpt2
>>> 32254619.594  ops/s
>>> 
>>> LZ4 high is the one instance where block size mattered a lot. It's a bit
>>> suspicious really when you look at the ratio of performance to block size
>>> being close to 1:1. I couldn't spot a bug in the benchmark though.
>>> 
>>> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
>>> 
>>> Chunk size 8192, ratio 0.709473
>>> Chunk size 16384, ratio 0.667236
>>> Chunk size 32768, ratio 0.634735
>>> Chunk size 65536, ratio 0.607208
>>> 
>>> By way of comparison I also ran deflate with maximum compression:
>>> 
>>> Chunk size 8192, ratio 0.426434
>>> Chunk size 16384, ratio 0.402423
>>> Chunk size 32768, ratio 0.381627
>>> Chunk size 65536, ratio 0.364865
>>> 
>>> Ariel
>>> 
 On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
 FWIW, I’m not -0, just think that long after the freeze date a change
 like this needs a strong mandate from the community.

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Ariel Weisberg

Hi,

I ran some benchmarks on my laptop
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16656821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16656821

For a random read workload, varying chunk size:
Chunk size  Time
   64k 25:20
   64k 25:33  
   32k 20:01
   16k 19:19
   16k 19:14
8k 16:51
4k 15:39

Ariel
On Thu, Oct 18, 2018, at 2:55 PM, Ariel Weisberg wrote:
> Hi,
> 
> For those who were asking about the performance impact of block size on 
> compression I wrote a microbenchmark.
> 
> https://pastebin.com/RHDNLGdC
> 
>  [java] Benchmark   Mode  
> Cnt  Score  Error  Units
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt   
> 15  331190055.685 ±  8079758.044  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt   
> 15  353024925.655 ±  7980400.003  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt   
> 15  365664477.654 ± 10083336.038  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt   
> 15  305518114.172 ± 11043705.883  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt   
> 15  688369529.911 ± 25620873.933  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt   
> 15  703635848.895 ±  5296941.704  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt   
> 15  695537044.676 ± 17400763.731  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt   
> 15  727725713.128 ±  4252436.331  ops/s
> 
> To summarize, compression is 8.5% slower and decompression is 1% faster. 
> This is measuring the impact on compression/decompression not the huge 
> impact that would occur if we decompressed data we don't need less 
> often.
> 
> I didn't test decompression of Snappy and LZ4 high, but I did test 
> compression.
> 
> Snappy:
>  [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
> 2  196574766.116  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
> 2  198538643.844  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
> 2  194600497.613  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
> 2  186040175.059  ops/s
> 
> LZ4 high compressor:
>  [java] CompactIntegerSequenceBench.bench16k  thrpt2  
> 20822947.578  ops/s
>  [java] CompactIntegerSequenceBench.bench32k  thrpt2  
> 12037342.253  ops/s
>  [java] CompactIntegerSequenceBench.bench64k  thrpt2   
> 6782534.469  ops/s
>  [java] CompactIntegerSequenceBench.bench8k   thrpt2  
> 32254619.594  ops/s
> 
> LZ4 high is the one instance where block size mattered a lot. It's a bit 
> suspicious really when you look at the ratio of performance to block 
> size being close to 1:1. I couldn't spot a bug in the benchmark though.
> 
> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
> 
> Chunk size 8192, ratio 0.709473
> Chunk size 16384, ratio 0.667236
> Chunk size 32768, ratio 0.634735
> Chunk size 65536, ratio 0.607208
> 
> By way of comparison I also ran deflate with maximum compression:
> 
> Chunk size 8192, ratio 0.426434
> Chunk size 16384, ratio 0.402423
> Chunk size 32768, ratio 0.381627
> Chunk size 65536, ratio 0.364865
> 
> Ariel
>  
> On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> > FWIW, I’m not -0, just think that long after the freeze date a change 
> > like this needs a strong mandate from the community.  I think the change 
> > is a good one.
> > 
> > 
> > 
> > 
> > 
> > > On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> > > 
> > > Hi,
> > > 
> > > It's really not appreciably slower compared to the decompression we are 
> > > going to do which is going to take several microseconds. Decompression is 
> > > also going to be faster because we are going to do less unnecessary 
> > > decompression and the decompression itself may be faster since it may fit 
> > > in a higher level cache better. I ran a microbenchmark comparing them.
> > > 
> > > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> > > 
> > > Fetching a long from memory:   56 nanoseconds
> > > Compact integer sequence   :   80 nanoseconds
> > > Summing integer sequence   :  165 nanoseconds
> > > 
> > > Currently we have one +1 from Kurt to change the representation and 
> > > possibly a -0 from Benedict. That's not really enough to make an 
> > > exception to the code freeze. If you want it to happen (or not) you need 
> > > to speak up otherwise only the default will change.
> > > 
> > > Regards,
> > > A

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Joshua McKenzie

>
> The predominant phrased used in that thread was 'feature freeze'.

At the risk of hijacking this thread, when are we going to transition from
"no new features, change whatever else you want including refactoring and
changing years-old defaults" to "ok, we think we have something that's
stable, time to start testing"?

Right now, if the community starts aggressively testing 4.0 with all the
changes still in flight, there's likely going to be a lot of wasted effort.
I think the root of the disconnect was that when we discussed "freeze" on
the mailing list, it was in the context of getting everyone engaged in
testing 4.0.

On Fri, Oct 19, 2018 at 9:46 AM Ariel Weisberg  wrote:

> Hi,
>
> I ran some benchmarks on my laptop
>
> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16656821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16656821
>
> For a random read workload, varying chunk size:
> Chunk size  Time
>64k 25:20
>64k 25:33
>32k 20:01
>16k 19:19
>16k 19:14
> 8k 16:51
> 4k 15:39
>
> Ariel
> On Thu, Oct 18, 2018, at 2:55 PM, Ariel Weisberg wrote:
> > Hi,
> >
> > For those who were asking about the performance impact of block size on
> > compression I wrote a microbenchmark.
> >
> > https://pastebin.com/RHDNLGdC
> >
> >  [java] Benchmark
>  Mode
> > Cnt  Score  Error  Units
> >  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16k
> thrpt
> > 15  331190055.685 ±  8079758.044  ops/s
> >  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32k
> thrpt
> > 15  353024925.655 ±  7980400.003  ops/s
> >  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64k
> thrpt
> > 15  365664477.654 ± 10083336.038  ops/s
> >  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k
>  thrpt
> > 15  305518114.172 ± 11043705.883  ops/s
> >  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k
> thrpt
> > 15  688369529.911 ± 25620873.933  ops/s
> >  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k
> thrpt
> > 15  703635848.895 ±  5296941.704  ops/s
> >  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k
> thrpt
> > 15  695537044.676 ± 17400763.731  ops/s
> >  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k
>  thrpt
> > 15  727725713.128 ±  4252436.331  ops/s
> >
> > To summarize, compression is 8.5% slower and decompression is 1% faster.
> > This is measuring the impact on compression/decompression not the huge
> > impact that would occur if we decompressed data we don't need less
> > often.
> >
> > I didn't test decompression of Snappy and LZ4 high, but I did test
> compression.
> >
> > Snappy:
> >  [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
>
> > 2  196574766.116  ops/s
> >  [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
>
> > 2  198538643.844  ops/s
> >  [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
>
> > 2  194600497.613  ops/s
> >  [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
>
> > 2  186040175.059  ops/s
> >
> > LZ4 high compressor:
> >  [java] CompactIntegerSequenceBench.bench16k  thrpt2
> > 20822947.578  ops/s
> >  [java] CompactIntegerSequenceBench.bench32k  thrpt2
> > 12037342.253  ops/s
> >  [java] CompactIntegerSequenceBench.bench64k  thrpt2
> > 6782534.469  ops/s
> >  [java] CompactIntegerSequenceBench.bench8k   thrpt2
> > 32254619.594  ops/s
> >
> > LZ4 high is the one instance where block size mattered a lot. It's a bit
> > suspicious really when you look at the ratio of performance to block
> > size being close to 1:1. I couldn't spot a bug in the benchmark though.
> >
> > Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
> >
> > Chunk size 8192, ratio 0.709473
> > Chunk size 16384, ratio 0.667236
> > Chunk size 32768, ratio 0.634735
> > Chunk size 65536, ratio 0.607208
> >
> > By way of comparison I also ran deflate with maximum compression:
> >
> > Chunk size 8192, ratio 0.426434
> > Chunk size 16384, ratio 0.402423
> > Chunk size 32768, ratio 0.381627
> > Chunk size 65536, ratio 0.364865
> >
> > Ariel
> >
> > On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> > > FWIW, I’m not -0, just think that long after the freeze date a change
> > > like this needs a strong mandate from the community.  I think the
> change
> > > is a good one.
> > >
> > >
> > >
> > >
> > >
> > > > On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> > > >
> > > > Hi,
> > > >
> > > > It's really not appreciably slower compared to the decompression we
> are going to do which is going to take several microseconds. Decompression
> is also going to be faster because we are going to do less unnecessary
> decompression and the decompression itself may be faster since it

Re: Built in trigger: double-write for app migration

2018-10-19 Thread Carl Mueller

new DC and then split is one way, but you have to wait for it to stream,
and then how do you know the dc coherence is good enough to switch the
targetted DC for local_quorum? And then once we split it we'd have downtime
to "change the name" and other work that would distinguish it from the
original cluster, from what I'm told from the peoples that do the DC /
cluster setup and aws provisioning. It is a tool in the toolchest...

We might be able to get stats of the queries and updates impacting the
cluster in a centralized manner with a trigger too.

We will probably do stream-to-kafka trigger based on what is on the
intarweb and since we have kafka here already.

I will look at CDC.

Thank you everybody!


On Fri, Oct 19, 2018 at 3:29 AM Antonis Papaioannou 
wrote:

> It reminds me of “shadow writes” described in [1].
> During data migration the coordinator forwards  a copy of any write
> request regarding tokens that are being transferred to the new node.
>
> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,
> https://ieeexplore.ieee.org/document/8069080
>
>
> > On 18 Oct 2018, at 18:53, Carl Mueller 
> > 
> wrote:
> >
> > tl;dr: a generic trigger on TABLES that will mirror all writes to
> > facilitate data migrations between clusters or systems. What is necessary
> > to ensure full write mirroring/coherency?
> >
> > When cassandra clusters have several "apps" aka keyspaces serving
> > applications colocated on them, but the app/keyspace bandwidth and size
> > demands begin impacting other keyspaces/apps, then one strategy is to
> > migrate the keyspace to its own dedicated cluster.
> >
> > With backups/sstableloading, this will entail a delay and therefore a
> > "coherency" shortfall between the clusters. So typically one would
> employ a
> > "double write, read once":
> >
> > - all updates are mirrored to both clusters
> > - writes come from the current most coherent.
> >
> > Often two sstable loads are done:
> >
> > 1) first load
> > 2) turn on double writes/write mirroring
> > 3) a second load is done to finalize coherency
> > 4) switch the app to point to the new cluster now that it is coherent
> >
> > The double writes and read is the sticking point. We could do it at the
> app
> > layer, but if the app wasn't written with that, it is a lot of testing
> and
> > customization specific to the framework.
> >
> > We could theoretically do some sort of proxying of the java-driver
> somehow,
> > but all the async structures and complex interfaces/apis would be
> difficult
> > to proxy. Maybe there is a lower level in the java-driver that is
> possible.
> > This also would only apply to the java-driver, and not
> > python/go/javascript/other drivers.
> >
> > Finally, I suppose we could do a trigger on the tables. It would be
> really
> > nice if we could add to the cassandra toolbox the basics of a write
> > mirroring trigger that could be activated "fairly easily"... now I know
> > there are the complexities of inter-cluster access, and if we are even
> > using cassandra as the target mirror system (for example there is an
> > article on triggers write-mirroring to kafka:
> > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >
> > And this starts to get into the complexities of hinted handoff as well.
> But
> > fundamentally this seems something that would be a very nice feature
> > (especially when you NEED it) to have in the core of cassandra.
> >
> > Finally, is the mutation hook in triggers sufficient to track all
> incoming
> > mutations (outside of "shudder" other triggers generating data)
>
>

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Benedict Elliott Smith

Shall we move this discussion to a separate thread?  I agree it needs to be 
had, but this will definitely derail this discussion.

To respond only to the relevant portion for this thread:

> changing years-old defaults

I don’t see how age is relevant?  This isn’t some ‘battle hardened’ feature 
we’re changing - most users don’t even know to change this parameter, so we 
can’t claim its length of existence works in its favour.

The project had fewer resources to be as thorough when this tickets landed, so 
we can’t even claim we’re overturning careful work.  This default was defined 
in 2011 with no performance comparisons with other possible sizes, or 
justification for the selection made on ticket (CASSANDRA-47 - yes, they once 
went down to two digits!).  

That’s not to say this wasn’t a fine default - it was.  In this case, age has 
actively worked against it.  Since 2011, SSDs have become the norm, and most 
servers have more memory than we are presently able to utilise effectively.

This is a no brainer, and doesn’t have any impact on testing.  Tests run with 
64KiB are just as valid as those run with 16KiB.  Performance tests should 
anyway compare like-to-like, so this is completely testing neutral AFAICT.

> On 19 Oct 2018, at 15:16, Joshua McKenzie  wrote:
> 
> At the risk of hijacking this thread, when are we going to transition from
> "no new features, change whatever else you want including refactoring and
> changing years-old defaults" to "ok, we think we have something that's
> stable, time to start testing"?
> 
> Right now, if the community starts aggressively testing 4.0 with all the
> changes still in flight, there's likely going to be a lot of wasted effort.
> I think the root of the disconnect was that when we discussed "freeze" on
> the mailing list, it was in the context of getting everyone engaged in
> testing 4.0.

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Michael Shuler

On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> 
> At the risk of hijacking this thread, when are we going to transition from
> "no new features, change whatever else you want including refactoring and
> changing years-old defaults" to "ok, we think we have something that's
> stable, time to start testing"?

Creating a cassandra-4.0 branch would allow trunk to, for instance, get
a default config value change commit and get more testing. We might
forget again, from what I understand of Benedict's last comment :)

-- 
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Built in trigger: double-write for app migration

2018-10-19 Thread Carl Mueller

Also we have 2.1.x and 2.2 clusters, so we can't use CDC since apparently
that is a 3.8 feature.

Virtual tables are very exciting so we could do some collating stuff (which
I'd LOVE to do with our scheduling application where we can split tasks
into near term/most frequent(hours to days), medium-term/less common(days
to weeks), long/years ), with the aim of avoiding having to do compaction
at all and just truncating buckets as they "expire" for a nice O(1)
compaction process.

On Fri, Oct 19, 2018 at 9:57 AM Carl Mueller 
wrote:

> new DC and then split is one way, but you have to wait for it to stream,
> and then how do you know the dc coherence is good enough to switch the
> targetted DC for local_quorum? And then once we split it we'd have downtime
> to "change the name" and other work that would distinguish it from the
> original cluster, from what I'm told from the peoples that do the DC /
> cluster setup and aws provisioning. It is a tool in the toolchest...
>
> We might be able to get stats of the queries and updates impacting the
> cluster in a centralized manner with a trigger too.
>
> We will probably do stream-to-kafka trigger based on what is on the
> intarweb and since we have kafka here already.
>
> I will look at CDC.
>
> Thank you everybody!
>
>
> On Fri, Oct 19, 2018 at 3:29 AM Antonis Papaioannou 
> wrote:
>
>> It reminds me of “shadow writes” described in [1].
>> During data migration the coordinator forwards  a copy of any write
>> request regarding tokens that are being transferred to the new node.
>>
>> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,
>> https://ieeexplore.ieee.org/document/8069080
>>
>>
>> > On 18 Oct 2018, at 18:53, Carl Mueller 
>> > 
>> wrote:
>> >
>> > tl;dr: a generic trigger on TABLES that will mirror all writes to
>> > facilitate data migrations between clusters or systems. What is
>> necessary
>> > to ensure full write mirroring/coherency?
>> >
>> > When cassandra clusters have several "apps" aka keyspaces serving
>> > applications colocated on them, but the app/keyspace bandwidth and size
>> > demands begin impacting other keyspaces/apps, then one strategy is to
>> > migrate the keyspace to its own dedicated cluster.
>> >
>> > With backups/sstableloading, this will entail a delay and therefore a
>> > "coherency" shortfall between the clusters. So typically one would
>> employ a
>> > "double write, read once":
>> >
>> > - all updates are mirrored to both clusters
>> > - writes come from the current most coherent.
>> >
>> > Often two sstable loads are done:
>> >
>> > 1) first load
>> > 2) turn on double writes/write mirroring
>> > 3) a second load is done to finalize coherency
>> > 4) switch the app to point to the new cluster now that it is coherent
>> >
>> > The double writes and read is the sticking point. We could do it at the
>> app
>> > layer, but if the app wasn't written with that, it is a lot of testing
>> and
>> > customization specific to the framework.
>> >
>> > We could theoretically do some sort of proxying of the java-driver
>> somehow,
>> > but all the async structures and complex interfaces/apis would be
>> difficult
>> > to proxy. Maybe there is a lower level in the java-driver that is
>> possible.
>> > This also would only apply to the java-driver, and not
>> > python/go/javascript/other drivers.
>> >
>> > Finally, I suppose we could do a trigger on the tables. It would be
>> really
>> > nice if we could add to the cassandra toolbox the basics of a write
>> > mirroring trigger that could be activated "fairly easily"... now I know
>> > there are the complexities of inter-cluster access, and if we are even
>> > using cassandra as the target mirror system (for example there is an
>> > article on triggers write-mirroring to kafka:
>> > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>> >
>> > And this starts to get into the complexities of hinted handoff as well.
>> But
>> > fundamentally this seems something that would be a very nice feature
>> > (especially when you NEED it) to have in the core of cassandra.
>> >
>> > Finally, is the mutation hook in triggers sufficient to track all
>> incoming
>> > mutations (outside of "shudder" other triggers generating data)
>>
>>

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Sankalp Kohli

(We should definitely harden the definition for freeze in a separate thread)

My thinking is that this is the best time to do this change as we have not even 
cut alpha or beta. All the people involved in the test will definitely be 
testing it again when we have these releases. 

> On Oct 19, 2018, at 8:00 AM, Michael Shuler  wrote:
> 
>> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
>> 
>> At the risk of hijacking this thread, when are we going to transition from
>> "no new features, change whatever else you want including refactoring and
>> changing years-old defaults" to "ok, we think we have something that's
>> stable, time to start testing"?
> 
> Creating a cassandra-4.0 branch would allow trunk to, for instance, get
> a default config value change commit and get more testing. We might
> forget again, from what I understand of Benedict's last comment :)
> 
> -- 
> Michael
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Using Cassandra as local db without cluster

2018-10-19 Thread Abdelkrim Fitouri

Thanks for the infos,

I will try aerospike also, they are always including a one node
installation on there benchmarking and are also talking about vertical
scalability.

Kind regards.


Le jeu. 18 oct. 2018 14:44, Aleksey Yeshchenko  a écrit :

> I agree with Jeff here.
>
> Furthermore, Cassandra should generally be your solution of last resort -
> if nothing else works out.
>
> In your case I’d try sqlite or leveldb (or rocksdb).
>
> > On 18 Oct 2018, at 11:46, Jeff Jirsa  wrote:
> >
> > I can’t think of a situation where I’d choose Cassandra as a database in
> a single-host use case (if you’re sure it’ll never be more than one
> machine).
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Oct 18, 2018, at 12:31 PM, Abdelkrim Fitouri 
> wrote:
> >>
> >> Hello,
> >>
> >> I am wondering if using cassandra as one local database without the
> cluster
> >> capabilities has a sens, (i cannot do multi node cluster due to a
> technical
> >> constraint)
> >>
> >> I have an application with a purpose to store a dynamic number of
> colones
> >> on each rows (thing that i cannot do with classical relational
> database),
> >> and i don't want to use documents based nosql database to avoid using
> Json
> >> marshal and unmarshal treatments...
> >>
> >> Does cassandra with only one node and with a well designer model based
> on
> >> queries and partition keys can lead to best performance than postgresql
> ?
> >>
> >> Does cassandra have some limitation about the size of data ? about the
> >> number of partition on a node ?
> >>
> >> Thanks for any details or help.
> >>
> >> --
> >>
> >> Best Regards.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Jonathan Haddad

I think we should try to do the right thing for the most people that we
can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
of clusters created by a lot of different teams, going from brand new to
pretty damn knowledgeable.  I can't think of a single time over the last 2
years that I've seen a cluster use non-default settings for compression.
With only a handful of exceptions, I've lowered the chunk size considerably
(usually to 4 or 8K) and the impact has always been very noticeable,
frequently resulting in hardware reduction and cost savings.  Of all the
poorly chosen defaults we have, this is one of the biggest offenders that I
see.  There's a good reason ScyllaDB  claims they're so much faster than
Cassandra - we ship a DB that performs poorly for 90+% of teams because we
ship for a specific use case, not a general one (time series on memory
constrained boxes being the specific use case)

This doesn't impact existing tables, just new ones.  More and more teams
are using Cassandra as a general purpose database, we should acknowledge
that adjusting our defaults accordingly.  Yes, we use a little bit more
memory on new tables if we just change this setting, and what we get out of
it is a massive performance win.

I'm +1 on the change as well.

On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
wrote:

> (We should definitely harden the definition for freeze in a separate
> thread)
>
> My thinking is that this is the best time to do this change as we have not
> even cut alpha or beta. All the people involved in the test will definitely
> be testing it again when we have these releases.
>
> > On Oct 19, 2018, at 8:00 AM, Michael Shuler 
> wrote:
> >
> >> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> >>
> >> At the risk of hijacking this thread, when are we going to transition
> from
> >> "no new features, change whatever else you want including refactoring
> and
> >> changing years-old defaults" to "ok, we think we have something that's
> >> stable, time to start testing"?
> >
> > Creating a cassandra-4.0 branch would allow trunk to, for instance, get
> > a default config value change commit and get more testing. We might
> > forget again, from what I understand of Benedict's last comment :)
> >
> > --
> > Michael
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Jonathan Haddad

Sorry, to be clear - I'm +1 on changing the configuration default, but I
think changing the compression in memory representations warrants further
discussion and investigation before making a case for or against it yet.
An optimization that reduces in memory cost by over 50% sounds pretty good
and we never were really explicit that those sort of optimizations would be
excluded after our feature freeze.  I don't think they should necessarily
be excluded at this time, but it depends on the size and risk of the patch.

On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:

> I think we should try to do the right thing for the most people that we
> can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
> of clusters created by a lot of different teams, going from brand new to
> pretty damn knowledgeable.  I can't think of a single time over the last 2
> years that I've seen a cluster use non-default settings for compression.
> With only a handful of exceptions, I've lowered the chunk size considerably
> (usually to 4 or 8K) and the impact has always been very noticeable,
> frequently resulting in hardware reduction and cost savings.  Of all the
> poorly chosen defaults we have, this is one of the biggest offenders that I
> see.  There's a good reason ScyllaDB  claims they're so much faster than
> Cassandra - we ship a DB that performs poorly for 90+% of teams because we
> ship for a specific use case, not a general one (time series on memory
> constrained boxes being the specific use case)
>
> This doesn't impact existing tables, just new ones.  More and more teams
> are using Cassandra as a general purpose database, we should acknowledge
> that adjusting our defaults accordingly.  Yes, we use a little bit more
> memory on new tables if we just change this setting, and what we get out of
> it is a massive performance win.
>
> I'm +1 on the change as well.
>
>
>
> On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
> wrote:
>
>> (We should definitely harden the definition for freeze in a separate
>> thread)
>>
>> My thinking is that this is the best time to do this change as we have
>> not even cut alpha or beta. All the people involved in the test will
>> definitely be testing it again when we have these releases.
>>
>> > On Oct 19, 2018, at 8:00 AM, Michael Shuler 
>> wrote:
>> >
>> >> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
>> >>
>> >> At the risk of hijacking this thread, when are we going to transition
>> from
>> >> "no new features, change whatever else you want including refactoring
>> and
>> >> changing years-old defaults" to "ok, we think we have something that's
>> >> stable, time to start testing"?
>> >
>> > Creating a cassandra-4.0 branch would allow trunk to, for instance, get
>> > a default config value change commit and get more testing. We might
>> > forget again, from what I understand of Benedict's last comment :)
>> >
>> > --
>> > Michael
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> > For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Deprecating/removing PropertyFileSnitch?

2018-10-19 Thread Jeremy Hanna

Do you mean to say that during host replacement there may be a time when the 
old->new host isn’t fully propagated and therefore wouldn’t yet be in all 
system tables?

> On Oct 17, 2018, at 4:20 PM, sankalp kohli  wrote:
> 
> This is not the case during host replacement correct?
> 
> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan <
> jeremiah.jor...@gmail.com> wrote:
> 
>> As long as we are correctly storing such things in the system tables and
>> reading them out of the system tables when we do not have the information
>> from gossip yet, it should not be a problem. (As far as I know GPFS does
>> this, but I have not done extensive code diving or testing to make sure all
>> edge cases are covered there)
>> 
>> -Jeremiah
>> 
>>> On Oct 16, 2018, at 11:56 AM, sankalp kohli 
>> wrote:
>>> 
>>> Will GossipingPropertyFileSnitch not be vulnerable to Gossip bugs where
>> we
>>> lose hostId or some other fields when we restart C* for large
>>> clusters(~1000 instances)?
>>> 
>>> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa  wrote:
>>> 
 We should, but the 4.0 features that log/reject verbs to invalid
>> replicas
 solves a lot of the concerns here
 
 --
 Jeff Jirsa
 
 
> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna 
 wrote:
> 
> We have had PropertyFileSnitch for a long time even though
 GossipingPropertyFileSnitch is effectively a superset of what it offers
>> and
 is much less error prone.  There are some unexpected behaviors when
>> things
 aren’t configured correctly with PFS.  For example, if you replace
>> nodes in
 one DC and add those nodes to that DCs property files and not the other
>> DCs
 property files - the resulting problems aren’t very straightforward to
 troubleshoot.
> 
> We could try to improve the resilience and fail fast error checking and
 error reporting of PFS, but honestly, why wouldn’t we deprecate and
>> remove
 PropertyFileSnitch?  Are there reasons why GPFS wouldn’t be sufficient
>> to
 replace it?
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Deprecating/removing PropertyFileSnitch?

2018-10-19 Thread Sankalp Kohli

Say you restarted all instances in the cluster and status for some host goes 
missing. Now when you start a host replacement, the new host won’t learn about 
the host whose status is missing and the view of this host will be wrong.

PS: I will be happy to be proved wrong as I can also start using Gossip snitch 
:) 

> On Oct 19, 2018, at 2:41 PM, Jeremy Hanna  wrote:
> 
> Do you mean to say that during host replacement there may be a time when the 
> old->new host isn’t fully propagated and therefore wouldn’t yet be in all 
> system tables?
> 
>> On Oct 17, 2018, at 4:20 PM, sankalp kohli  wrote:
>> 
>> This is not the case during host replacement correct?
>> 
>> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan <
>> jeremiah.jor...@gmail.com> wrote:
>> 
>>> As long as we are correctly storing such things in the system tables and
>>> reading them out of the system tables when we do not have the information
>>> from gossip yet, it should not be a problem. (As far as I know GPFS does
>>> this, but I have not done extensive code diving or testing to make sure all
>>> edge cases are covered there)
>>> 
>>> -Jeremiah
>>> 
 On Oct 16, 2018, at 11:56 AM, sankalp kohli 
>>> wrote:
 
 Will GossipingPropertyFileSnitch not be vulnerable to Gossip bugs where
>>> we
 lose hostId or some other fields when we restart C* for large
 clusters(~1000 instances)?
 
> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa  wrote:
> 
> We should, but the 4.0 features that log/reject verbs to invalid
>>> replicas
> solves a lot of the concerns here
> 
> --
> Jeff Jirsa
> 
> 
>> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna 
> wrote:
>> 
>> We have had PropertyFileSnitch for a long time even though
> GossipingPropertyFileSnitch is effectively a superset of what it offers
>>> and
> is much less error prone.  There are some unexpected behaviors when
>>> things
> aren’t configured correctly with PFS.  For example, if you replace
>>> nodes in
> one DC and add those nodes to that DCs property files and not the other
>>> DCs
> property files - the resulting problems aren’t very straightforward to
> troubleshoot.
>> 
>> We could try to improve the resilience and fail fast error checking and
> error reporting of PFS, but honestly, why wouldn’t we deprecate and
>>> remove
> PropertyFileSnitch?  Are there reasons why GPFS wouldn’t be sufficient
>>> to
> replace it?
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
> 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: Built in trigger: double-write for app migration

Re: Proposing an Apache Cassandra Management process

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: Built in trigger: double-write for app migration

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: Built in trigger: double-write for app migration

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: Using Cassandra as local db without cluster

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: CASSANDRA-13241 lower default chunk_length_in_kb

Re: Deprecating/removing PropertyFileSnitch?

Re: Deprecating/removing PropertyFileSnitch?

17 matches

Site Navigation

Mail list logo

Footer information