Cassandra insert from Spark slows down when running executors on the same node

2018-05-21 Thread Javier Pareja
Hello, I have a Spark Streaming job reading data from kafka, processing it and inserting it into Cassandra. The job is running on a cluster with 3 machines. I use Mesos to submit the job with 3 executors using 1 core each. The problem is that when all executors are running on the same node, the i

Re: 答复: Time serial column family design

2018-04-17 Thread Javier Pareja
Hi David, Could you describe why you chose to include the create date in the partition key? If the vin in enough "partitioning", meaning that the size (number of rows x size of row) of each partition is less than 100MB, then remove the date and just use the create_time, because the date is already

Archive cassandra old data into Hadoop

2018-03-12 Thread Javier Pareja
HIVE? What is the standard in this cases? F Javier Pareja

Re: data types storage saving

2018-03-10 Thread Javier Pareja
You can use variable-length zig-zag coding to encode an interview if using a blob. It is used in able and protocol buffers. Some examples: valuehex 0 00 -1 01 1 02 -2 03 2 04 ... -64 7f 64 80 01 ... On Sat, 10 Mar 2018, 07:52 onmstester onmstester, wrote: > I've find out that blobs has no gain

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
some sort of binary index for the clustering keys and for relatively large partitions it can be relatively expensive to maintain. F Javier Pareja On Wed, Mar 7, 2018 at 5:20 PM, Jeff Jirsa wrote: > > > On Wed, Mar 7, 2018 at 7:13 AM, Carlos Rolo wrote: > >> Hi Jeff, >>

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
uld I find another field for the partition and use the UUID for the clustering instead? F Javier Pareja On Wed, Mar 7, 2018 at 2:36 PM, Jeff Jirsa wrote: > There is no limit > > The token range of murmur3 is 2^64, but Cassandra properly handles token > overlaps (we use a key that’s effe

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you Rahul, but is it a good practice to use a large range here? Or would it be better to create partitions with more than 1 row (by using a clustering key)? >From a data query point of view I will be accessing the rows by a UID one at a time. F Javier Pareja On Wed, Mar 7, 2018 at 11:12

Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
this partition key can have? Is it recommended to have a clustering key to reduce this number by storing several rows in each partition instead of one row per partition. Regards, F Javier Pareja

Re: How do counter updates work?

2018-03-05 Thread Javier Pareja
Doesn't cassandra have TIMEUUID for these use cases? Anyways, hopefully someone can help me better understand possible delays when writing a counter. F Javier Pareja On Mon, Mar 5, 2018 at 1:54 PM, Hannu Kröger wrote: > Traditionally auto increment counters have been used to generate

Re: How do counter updates work?

2018-03-05 Thread Javier Pareja
Hi Kyrulo, I don't understand how UUIDs are related to counters, but I use counters to increment the value of a cell in an atomic manner. I could try reading the value and then writing to the cell but then I would lose the atomicity of the update. F Javier Pareja On Mon, Mar 5, 2018 at 1:

How do counter updates work?

2018-03-05 Thread Javier Pareja
on about how the counter lock is acquired, is there a shared lock across all the nodes? Hope I am not oversimplifying things, but I think this will be useful to better understand how to tune up the system. Thanks in advance. F Javier Pareja

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
Thank you Jürgen, The default consistency in the library in already ONE. I tried setting it anyways but it made no difference. Hopefully it is a configuration issue, that would be very good news!! Do you have any past/present experience with large counter tables? F Javier Pareja On Fri, Mar 2

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
MiB, capacity 480 MiB, 4635329524 misses, 6673522516 requests, 0.305 recent hit rate, NaN microseconds miss latency Percent Repaired : 0.0% Regards, Javier F Javier Pareja On Fri, Mar 2, 2018 at 7:01 PM, Alain RODRIGUEZ wrote: > Hi Javier, > > The only bottleneck in the writes

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
there are plenty of them) and only share the CPU and RAM. The only bottleneck in the writes as far as I understand it is the commit log. Shall I create RAID0 (for speed) or install an SSD just for the commitlog? Thanks, Javier F Javier Pareja On Fri, Mar 2, 2018 at 12:21 PM, Javier Pareja

Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
Hello everyone, I have configured a Cassandra cluster with 3 nodes, however I am not getting the write speed that I was expecting. I have tested against a counter table because it is the bottleneck of the system. So with the system iddle I run the attached sample code (very simple async writes wit

Re: Cassandra counter readtimeout error

2018-02-19 Thread Javier Pareja
table. - I also enabled tracing in the CQLSH but it showed nothing when querying this row. It however did when querying other tables... Thanks again for your reply!! I am very excited to be part of the Cassandra user base. Javier F Javier Pareja On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ

Cassandra counter readtimeout error

2018-02-17 Thread Javier Pareja
Hello everyone, I get a timeout error when reading a particular row from a large counters table. I have a storm topology that inserts data into a Cassandra counter table. This table has 6 partition keys, 4 primary keys and 5 counters. When data starts to be inserted, I can query the counters cor