Re: Request for Thoughts on Deployments on AWS EC2 vs. ECS

2025-06-11 Thread Dor Laor via user
It's possible to manage Cassandra well both with VMs and containers. As you'd be running one container per VM, there is no significant advantage for containers. K8s provides nice tooling and some methodological enforcement which brings order to the setup but if the team aren't top notch experts in

Re: Send large blobs

2022-05-31 Thread Dor Laor
On Tue, May 31, 2022 at 4:40 PM Andria Trigeorgi wrote: > Hi, > > I want to write large blobs in Cassandra. However, when I tried to write > more than a 256MB blob, I got the message: > "Error from server: code=2200 [Invalid query] message=\"Request is too > big: length 268435580 exceeds maximum

Re: about the performance of select * from tbl

2022-04-26 Thread Dor Laor
select * reads all of the data from the cluster, obviously it would be bad if you'll run a single query and expect it to return 'fast'. The best way is to divide the data set into chunks which will be selected by the range ownership per node, so you'll be able to query in parallel the entire cluste

Re: CDC Tools

2020-05-27 Thread Dor Laor
If it's helpful, IMO, the approach Cassandra needs to take isn't by tracking the individual node commit log and putting the burden on the client. At Scylla, we had the 'opportunity' to be a late comer and see what approach Cassadnra took and what DynamoDB streams took. We've implemented CDC as a r

Re: What does "PER PARTITION LIMIT" means in cql query in cassandra?

2020-05-07 Thread Dor Laor
In your schema case, for each client_id you will get a single 'when' row. Just one. Even when there are multiple rows (clustering keys) On Thu, May 7, 2020 at 12:14 AM Check Peck wrote: > > I have a scylla table as shown below: > > > cqlsh:sampleks> describe table test; > > > CREATE TABLE

Re: Disabling Swap for Cassandra

2020-04-16 Thread Dor Laor
to-lock-the-pages-of-a-process-in-memory > > > Thanks > Kunal > > On Thu, Apr 16, 2020 at 4:31 PM Dor Laor wrote: >> >> It is good to configure swap for the OS but exempt Cassandra >> from swapping. Why is it good? Since you never know the >> memory utiliz

Re: Disabling Swap for Cassandra

2020-04-16 Thread Dor Laor
It is good to configure swap for the OS but exempt Cassandra from swapping. Why is it good? Since you never know the memory utilization of additional agents and processes you or other admins will run on your server. So do configure a swap partition. You can control the eagerness of the kernel by t

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Dor Laor
Another option is to use the Spark migrator, it reads a source CQL cluster and writes to another. It has a validation stage that compares a full scan and reports the diff: https://github.com/scylladb/scylla-migrator There are many more ways to clone a cluster. My main recommendation is to 'optimiz

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Dor Laor
Another option instead of raw sstables is to use the Spark Migrator [1]. It reads a source cluster, can make some transformations (like table/column naming) and writes to a target cluster. It's a very convenient tool, OSS and free of charge. [1] https://github.com/scylladb/scylla-migrator On Fri,

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-10 Thread Dor Laor
he, you’re >> also benefitting from the decompression. However I’ve started to wonder >> how often sstable compression is worth the performance drag and internal C* >> complexity. If you compare to where a more traditional RDBMS would use >> compression, e.g. Postgres, use

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread Dor Laor
The DynamoDB model has several key benefits over Cassandra's. The most notable one is the tablet concept - data is partitioned into 10GB chunks. So scaling happens where such a tablet reaches maximum capacity and it is automatically divided to two. It can happen in parallel across the entire data s

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Dor Laor
rom the realtime workload for isolation and low latency guarantees. We addressed this problem elsewhere, beyond this scope. > > > > Sean Durity > > > > *From:* Dor Laor > *Sent:* Friday, January 04, 2019 4:21 PM > *To:* user@cassandra.apache.org > *Subject:

Re: Cassandra Splitting databases

2019-01-06 Thread Dor Laor
a bit with multi DC setup with docker images and repeat it with a test dataset until you are confidence about the commands and their outcome. This example should work with Cassandra: https://www.scylladb.com/2018/03/28/mms-day7-multidatacenter-consistency/ On Sat, Jan 5, 2019 at 5:57 AM R1 J1 wr

Re: Cassandra Splitting databases

2019-01-04 Thread Dor Laor
Not sure I understand correctly but if you have one cluster with 2 separate datacenters you can define keyspace A to be on premise with a single DC and keyspace B only on Azure. On Fri, Jan 4, 2019 at 2:23 PM R1 J1 wrote: > We currently have 2 databases (A and B ) on a 6 node cluster. > 3 nod

Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-04 Thread Dor Laor
I strongly recommend option B, separate clusters. Reasons: - Networking of node-node is negligible compared to networking within the node - Different scaling considerations Your workload may require 10 Spark nodes and 20 database nodes, so why bundle them? This ratio may also change over ti

Re: Migrating from DSE5.1.2 to Opensource cassandra

2018-12-05 Thread Dor Laor
An alternative approach is to form another new cluster, leave the original cluster alive (many times it's a must since it needs to be 24x7 online). Double write to the two clusters and later migrate the data to it. Either by taking a snapshot and pass those files to the new cluster or with sstablel

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
e LWT/CAS to guarantee state if you have a data model where it >> matters. >> >> >> -- >> Jeff Jirsa >> >> >> On Mar 8, 2018, at 6:18 PM, Dor Laor wrote: >> >> While NTP on the servers is important, make sure that you use client >> ti

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
; matters. > > -- > Jeff Jirsa > > > On Mar 8, 2018, at 6:18 PM, Dor Laor wrote: > > While NTP on the servers is important, make sure that you use client > timestamps and > not server. Since the last write wins, the data generator should be the > one setting its timestamp. >

Re: Amazon Time Sync Service + ntpd vs chrony

2018-03-08 Thread Dor Laor
While NTP on the servers is important, make sure that you use client timestamps and not server. Since the last write wins, the data generator should be the one setting its timestamp. On Thu, Mar 8, 2018 at 2:12 PM, Ben Slater wrote: > It is important to make sure you are using the same NTP serve

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread Dor Laor
I think you're introducing a layer violation. GDPR is a business requirement and compaction is an implementation detail. IMHO it's enough to delete the partition using regular CQL. It's true that it won't be deleted immedietly but it will be eventually deleted (welcome to eventual consistency ;).

Re: Too many open files

2018-01-22 Thread Dor Laor
It's a high number, your compaction may run behind and thus many small sstables exist. However, you're also taking the number of network connection in the calculation (everything in *nix is a file). If it makes you feel better my laptop has 40k open files for Chrome.. On Sun, Jan 21, 2018 at 11:59

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Dor Laor
On Tue, Jan 9, 2018 at 11:19 PM, daemeon reiydelle wrote: > Good luck with that. Pcid out since mid 2017 as I recall? > > > Daemeon (Dæmœn) Reiydelle > USA 1.415.501.0198 <(415)%20501-0198> > > On Jan 9, 2018 10:31 AM, "Dor Laor" wrote: > > Make sure

Re: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Dor Laor
Make sure you pick instances with PCID cpu capability, their TLB overhead flush overhead is much smaller On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas < thomas.steinmau...@dynatrace.com> wrote: > Quick follow up. > > > > Others in AWS reporting/seeing something similar, e.g.: > https://twit

Re: Full table scan with cassandra

2017-08-17 Thread Dor Laor
st multithreaded app is > not the bottleneck. It is not. > > I expected some kind of elasticity, I see none. Feels like I do something > wrong... > > > > On 17 August 2017 at 00:19, Dor Laor wrote: > >> Hi Alex, >> >> You probably didn't get the

Re: Full table scan with cassandra

2017-08-16 Thread Dor Laor
Hi Alex, You probably didn't get the paralelism right. Serial scan has a paralelism of one. If the paralelism isn't large enough, perf will be slow. If paralelism is too large, Cassandra and the disk will trash and have too many context switches. So you need to find your cluster's sweet spot. We

Re: EC2 instance recommendations

2017-05-23 Thread Dor Laor
Note that EBS durability isn't perfect, you cannot rely on them entirely: https://aws.amazon.com/ebs/details/ "Amazon EBS volumes are designed for an annual failure rate (AFR) of between 0.1% - 0.2%, where failure refers to a complete or partial loss of the volume, depending on the size and perform

Re: Bootstraping a Node With a Newer Version

2017-05-17 Thread Dor Laor
We've done such in-place upgrade in the past but not for a real production. However you're MISSING the point. The root filesystem along with the entire OS should be completely separated from your data directories. It should reside in a different logical volume and thus you can easily change the OS

Re: scylladb

2017-03-14 Thread Dor Laor
On Tue, Mar 14, 2017 at 7:43 AM, Eric Evans wrote: > On Sun, Mar 12, 2017 at 4:01 PM, James Carman > wrote: > > Does all of this Scylla talk really even belong on the Cassandra user > > mailing list in the first place? > > I personally found it interesting, informative, and on-topic when it > wa

Re: scylladb

2017-03-13 Thread Dor Laor
t;>> >>> On Sun, Mar 12, 2017 at 5:04 PM Kant Kodali wrote: >>> >>> yes. >>> >>> On Sun, Mar 12, 2017 at 2:01 PM, James Carman < >>> ja...@carmanconsulting.com> wrote: >>> >>> Does all of this Scylla

Re: scylladb

2017-03-13 Thread Dor Laor
further questions they are welcome to ask on our mailing list or privately. Cheers, Dor On Mon, Mar 13, 2017 at 12:43 AM, Dor Laor wrote: > On Mon, Mar 13, 2017 at 12:17 AM, benjamin roth wrote: > >> @Dor,Jeff: >> >> I think Jeff pointed out an important fact: You cannot s

Re: scylladb

2017-03-12 Thread Dor Laor
>> ScyllaDB isn't a drop in replacement for Cassandra. Saying that it is is >> very misleading. The marketing material should really say something like >> "drop in replacement for some workloads" or "aims to be a drop in >> replacement". As is, it do

Re: scylladb

2017-03-12 Thread Dor Laor
e ANSI standard. As you said, our desire is to be 100% compatible. Btw, going back to technology discussion, while there are lots of reasons to use C++, the only challenge is in features like UDF/triggers which relay on JVM based code execution. We are likely to use Lua for it initially, and later we&#

Re: scylladb

2017-03-12 Thread Dor Laor
On Sun, Mar 12, 2017 at 6:40 AM, Stefan Podkowinski wrote: > If someone would create a benchmark showing that Cassandra is 10x faster > than Aerospike, would that mean Cassandra is 100x faster than ScyllaDB? > > Joking aside, I personally don't pay a lot of attention to any published > benchmarks

Re: scylladb

2017-03-11 Thread Dor Laor
On Sat, Mar 11, 2017 at 2:19 PM, Kant Kodali wrote: > My response is inline. > > On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity wrote: > >> There are several issues at play here. >> >> First, a database runs a large number of concurrent operations, each of >> which only consumes a small amount of C

Re: scylladb

2017-03-11 Thread Dor Laor
On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa wrote: > > > On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote: > > Cassanda vs Scylla is a valid comparison because they both are > compatible. Scylla is a drop-in replacement for Cassandra. > > No, they aren't, and no, it isn't > Jeff is angry with us

Re: scylladb

2017-03-10 Thread Dor Laor
gine is ideal for the larger number of round trips the LWT needs. This is with the Linux tcp stack, once we'll use our dpdk one, performance will improve further ;) > > On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor wrote: > >> Scylla isn't just about performance too. &

Re: scylladb

2017-03-10 Thread Dor Laor
Scylla isn't just about performance too. First, a disclaimer, I am a Scylla co-founder. I respect open source a lot, so you guys are welcome to shush me out of this thread. I only participate to provide value if I can (this is a thread about Scylla and our users are on our mailing list). Scylla i

Re: How many vnodes should I use for each node in my cluster?

2016-09-16 Thread Dor Laor
On Fri, Sep 16, 2016 at 11:29 AM, Li, Guangxing wrote: > Hi, > > I have a 3 nodes cluster, each with less than 200 GB data. Currently all > nodes have the default 256 value for num_tokens. My colleague told me that > with the data size I have (less than 200 GB on each node), I should change > num

Re: IO scheduler for SSDs on EC2?

2015-03-15 Thread Dor Laor
On Sun, Mar 15, 2015 at 2:03 PM, Ali Akhtar wrote: > I was watching a talk recently on Elasticsearch performance in EC2, and > they recommended setting the IO scheduler to noop for SSDs. Is that the > case for Cassandra as well, or is it recommended to keep the default > 'deadline' scheduler for