It's possible to manage Cassandra well both with VMs and containers.
As you'd be running one container per VM, there is no significant
advantage for
containers. K8s provides nice tooling and some methodological enforcement
which
brings order to the setup but if the team aren't top notch experts in
On Tue, May 31, 2022 at 4:40 PM Andria Trigeorgi
wrote:
> Hi,
>
> I want to write large blobs in Cassandra. However, when I tried to write
> more than a 256MB blob, I got the message:
> "Error from server: code=2200 [Invalid query] message=\"Request is too
> big: length 268435580 exceeds maximum
select * reads all of the data from the cluster, obviously it would be bad
if you'll
run a single query and expect it to return 'fast'. The best way is to
divide the data
set into chunks which will be selected by the range ownership per node, so
you'll
be able to query in parallel the entire cluste
If it's helpful, IMO, the approach Cassandra needs to take isn't
by tracking the individual node commit log and putting the burden
on the client. At Scylla, we had the 'opportunity' to be a late comer
and see what approach Cassadnra took and what DynamoDB streams
took.
We've implemented CDC as a r
In your schema case, for each client_id you will get a single 'when'
row. Just one. Even when there are multiple rows (clustering keys)
On Thu, May 7, 2020 at 12:14 AM Check Peck wrote:
>
> I have a scylla table as shown below:
>
>
> cqlsh:sampleks> describe table test;
>
>
> CREATE TABLE
to-lock-the-pages-of-a-process-in-memory
>
>
> Thanks
> Kunal
>
> On Thu, Apr 16, 2020 at 4:31 PM Dor Laor wrote:
>>
>> It is good to configure swap for the OS but exempt Cassandra
>> from swapping. Why is it good? Since you never know the
>> memory utiliz
It is good to configure swap for the OS but exempt Cassandra
from swapping. Why is it good? Since you never know the
memory utilization of additional agents and processes you or
other admins will run on your server.
So do configure a swap partition.
You can control the eagerness of the kernel by t
Another option is to use the Spark migrator, it reads a source CQL cluster and
writes to another. It has a validation stage that compares a full scan
and reports the diff:
https://github.com/scylladb/scylla-migrator
There are many more ways to clone a cluster. My main recommendation is
to 'optimiz
Another option instead of raw sstables is to use the Spark Migrator [1].
It reads a source cluster, can make some transformations (like
table/column naming) and
writes to a target cluster. It's a very convenient tool, OSS and free of charge.
[1] https://github.com/scylladb/scylla-migrator
On Fri,
he, you’re
>> also benefitting from the decompression. However I’ve started to wonder
>> how often sstable compression is worth the performance drag and internal C*
>> complexity. If you compare to where a more traditional RDBMS would use
>> compression, e.g. Postgres, use
The DynamoDB model has several key benefits over Cassandra's.
The most notable one is the tablet concept - data is partitioned into 10GB
chunks. So scaling happens where such a tablet reaches maximum capacity
and it is automatically divided to two. It can happen in parallel across
the entire
data s
rom the realtime workload for
isolation and low latency guarantees.
We addressed this problem elsewhere, beyond this scope.
>
>
>
> Sean Durity
>
>
>
> *From:* Dor Laor
> *Sent:* Friday, January 04, 2019 4:21 PM
> *To:* user@cassandra.apache.org
> *Subject:
a bit with multi DC setup with docker images and repeat
it with a test dataset until you are confidence about the
commands and their outcome. This example should work with Cassandra:
https://www.scylladb.com/2018/03/28/mms-day7-multidatacenter-consistency/
On Sat, Jan 5, 2019 at 5:57 AM R1 J1 wr
Not sure I understand correctly but if you have one cluster with 2 separate
datacenters
you can define keyspace A to be on premise with a single DC and keyspace B
only on Azure.
On Fri, Jan 4, 2019 at 2:23 PM R1 J1 wrote:
> We currently have 2 databases (A and B ) on a 6 node cluster.
> 3 nod
I strongly recommend option B, separate clusters. Reasons:
- Networking of node-node is negligible compared to networking within the
node
- Different scaling considerations
Your workload may require 10 Spark nodes and 20 database nodes, so why
bundle them?
This ratio may also change over ti
An alternative approach is to form another new cluster, leave the original
cluster alive (many times
it's a must since it needs to be 24x7 online). Double write to the two
clusters and later migrate the
data to it. Either by taking a snapshot and pass those files to the new
cluster or with sstablel
e LWT/CAS to guarantee state if you have a data model where it
>> matters.
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Mar 8, 2018, at 6:18 PM, Dor Laor wrote:
>>
>> While NTP on the servers is important, make sure that you use client
>> ti
; matters.
>
> --
> Jeff Jirsa
>
>
> On Mar 8, 2018, at 6:18 PM, Dor Laor wrote:
>
> While NTP on the servers is important, make sure that you use client
> timestamps and
> not server. Since the last write wins, the data generator should be the
> one setting its timestamp.
>
While NTP on the servers is important, make sure that you use client
timestamps and
not server. Since the last write wins, the data generator should be the one
setting its timestamp.
On Thu, Mar 8, 2018 at 2:12 PM, Ben Slater
wrote:
> It is important to make sure you are using the same NTP serve
I think you're introducing a layer violation. GDPR is a business
requirement and
compaction is an implementation detail.
IMHO it's enough to delete the partition using regular CQL.
It's true that it won't be deleted immedietly but it will be eventually
deleted (welcome to eventual consistency ;).
It's a high number, your compaction may run behind and thus
many small sstables exist. However, you're also taking the
number of network connection in the calculation (everything
in *nix is a file). If it makes you feel better my laptop
has 40k open files for Chrome..
On Sun, Jan 21, 2018 at 11:59
On Tue, Jan 9, 2018 at 11:19 PM, daemeon reiydelle
wrote:
> Good luck with that. Pcid out since mid 2017 as I recall?
>
>
> Daemeon (Dæmœn) Reiydelle
> USA 1.415.501.0198 <(415)%20501-0198>
>
> On Jan 9, 2018 10:31 AM, "Dor Laor" wrote:
>
> Make sure
Make sure you pick instances with PCID cpu capability, their TLB overhead
flush
overhead is much smaller
On Tue, Jan 9, 2018 at 2:04 AM, Steinmaurer, Thomas <
thomas.steinmau...@dynatrace.com> wrote:
> Quick follow up.
>
>
>
> Others in AWS reporting/seeing something similar, e.g.:
> https://twit
st multithreaded app is
> not the bottleneck. It is not.
>
> I expected some kind of elasticity, I see none. Feels like I do something
> wrong...
>
>
>
> On 17 August 2017 at 00:19, Dor Laor wrote:
>
>> Hi Alex,
>>
>> You probably didn't get the
Hi Alex,
You probably didn't get the paralelism right. Serial scan has
a paralelism of one. If the paralelism isn't large enough, perf will be
slow.
If paralelism is too large, Cassandra and the disk will trash and have too
many context switches.
So you need to find your cluster's sweet spot. We
Note that EBS durability isn't perfect, you cannot rely on them entirely:
https://aws.amazon.com/ebs/details/
"Amazon EBS volumes are designed for an annual failure rate (AFR) of
between 0.1% - 0.2%, where failure refers to a complete or partial loss of
the volume, depending on the size and perform
We've done such in-place upgrade in the past but not for a real production.
However you're MISSING the point. The root filesystem along with the entire
OS should be completely separated from your data directories. It should
reside
in a different logical volume and thus you can easily change the OS
On Tue, Mar 14, 2017 at 7:43 AM, Eric Evans
wrote:
> On Sun, Mar 12, 2017 at 4:01 PM, James Carman
> wrote:
> > Does all of this Scylla talk really even belong on the Cassandra user
> > mailing list in the first place?
>
> I personally found it interesting, informative, and on-topic when it
> wa
t;>>
>>> On Sun, Mar 12, 2017 at 5:04 PM Kant Kodali wrote:
>>>
>>> yes.
>>>
>>> On Sun, Mar 12, 2017 at 2:01 PM, James Carman <
>>> ja...@carmanconsulting.com> wrote:
>>>
>>> Does all of this Scylla
further questions they are welcome to
ask on our mailing list or privately.
Cheers,
Dor
On Mon, Mar 13, 2017 at 12:43 AM, Dor Laor wrote:
> On Mon, Mar 13, 2017 at 12:17 AM, benjamin roth wrote:
>
>> @Dor,Jeff:
>>
>> I think Jeff pointed out an important fact: You cannot s
>> ScyllaDB isn't a drop in replacement for Cassandra. Saying that it is is
>> very misleading. The marketing material should really say something like
>> "drop in replacement for some workloads" or "aims to be a drop in
>> replacement". As is, it do
e ANSI standard. As you said, our desire is to be 100%
compatible.
Btw, going back to technology discussion, while there are lots of reasons
to use C++, the only
challenge is in features like UDF/triggers which relay on JVM based code
execution. We are likely to use Lua for it initially, and later we
On Sun, Mar 12, 2017 at 6:40 AM, Stefan Podkowinski wrote:
> If someone would create a benchmark showing that Cassandra is 10x faster
> than Aerospike, would that mean Cassandra is 100x faster than ScyllaDB?
>
> Joking aside, I personally don't pay a lot of attention to any published
> benchmarks
On Sat, Mar 11, 2017 at 2:19 PM, Kant Kodali wrote:
> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>> which only consumes a small amount of C
On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa wrote:
>
>
> On 2017-03-10 09:57 (-0800), Rakesh Kumar wrote:
> > Cassanda vs Scylla is a valid comparison because they both are
> compatible. Scylla is a drop-in replacement for Cassandra.
>
> No, they aren't, and no, it isn't
>
Jeff is angry with us
gine is ideal
for the larger number
of round trips the LWT needs.
This is with the Linux tcp stack, once we'll use our dpdk one, performance
will improve further ;)
>
> On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor wrote:
>
>> Scylla isn't just about performance too.
&
Scylla isn't just about performance too.
First, a disclaimer, I am a Scylla co-founder. I respect open source a lot,
so you guys are welcome to shush me out of this thread. I only participate
to provide value if I can (this is a thread about Scylla and our users are
on our mailing list).
Scylla i
On Fri, Sep 16, 2016 at 11:29 AM, Li, Guangxing
wrote:
> Hi,
>
> I have a 3 nodes cluster, each with less than 200 GB data. Currently all
> nodes have the default 256 value for num_tokens. My colleague told me that
> with the data size I have (less than 200 GB on each node), I should change
> num
On Sun, Mar 15, 2015 at 2:03 PM, Ali Akhtar wrote:
> I was watching a talk recently on Elasticsearch performance in EC2, and
> they recommended setting the IO scheduler to noop for SSDs. Is that the
> case for Cassandra as well, or is it recommended to keep the default
> 'deadline' scheduler for
39 matches
Mail list logo