Re: Delete too wide partitions

2023-07-16 Thread Dipan Shah
Hello Sébastien, No there is no in-built solution to perform such an operation in Cassandra. Thanks, Dipan On Sun, 16 Jul 2023 at 4:03 PM, Sébastien Rebecchi wrote: > Hi everyone > > Is there a way to tell Cassandra to automatically delete a partition when > its size increase a given threshold

Delete too wide partitions

2023-07-16 Thread Sébastien Rebecchi
Hi everyone Is there a way to tell Cassandra to automatically delete a partition when its size increase a given threshold? Best regard Sébastien

Re: Optimization for partitions with high number of rows

2023-04-17 Thread Gil Ganz
million cells) from disk >> in 120ms doesn't sound bad. That's a lots of deserialisation to do. If you >> want it to be faster, you can store the number of rows elsewhere if that's >> the only thing you need. >> On 11/04/2023 07:13, Gil Ganz wrote: >> &

Re: Optimization for partitions with high number of rows

2023-04-16 Thread Bowen Song via user
cluster, with reads of partitions that are a bit on the bigger side, taking longer than I would expect. Reading entire partition that has ~7 rows, total partition size of 4mb, takes 120ms, I would expect it to take less. This is after major compaction, so there is only one

Re: Optimization for partitions with high number of rows

2023-04-11 Thread Gil Ganz
If you > want it to be faster, you can store the number of rows elsewhere if that's > the only thing you need. > On 11/04/2023 07:13, Gil Ganz wrote: > > Hey > I have a 4.0.4 cluster, with reads of partitions that are a bit on the > bigger side, taking longer than I would ex

Re: Optimization for partitions with high number of rows

2023-04-11 Thread Bowen Song via user
z wrote: Hey I have a 4.0.4 cluster, with reads of partitions that are a bit on the bigger side, taking longer than I would expect. Reading entire partition that has ~7 rows, total partition size of 4mb, takes 120ms, I would expect it to take less. This is after major compaction, so th

Optimization for partitions with high number of rows

2023-04-10 Thread Gil Ganz
Hey I have a 4.0.4 cluster, with reads of partitions that are a bit on the bigger side, taking longer than I would expect. Reading entire partition that has ~7 rows, total partition size of 4mb, takes 120ms, I would expect it to take less. This is after major compaction, so there is only one

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Alex Ott
There is also nodetool toppartitions: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/nodetool/toolsToppartitions.html Erick Ramirez at "Mon, 16 Mar 2020 22:44:44 +1100" wrote: ER> How to find which table partitions having the more reads per sstables

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Hossein Ghiyasi Mehr
You can get read count per table (Total and TPS) in JMX. If you want to find hot partitions, you can use nodetool toppartitions without paying money! *---* *VafaTech <http://www.vafatech.com> : A Total Solution for Data Gathering &

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Erick Ramirez
> > How to find which table partitions having the more reads per sstables in > Cassandra? > Your question is unclear. Do you want to know which tables are read the most? If so, you'll need to run nodetool tablestats and parse/sort the output to get the top tables based on read c

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Kiran mk
Yes I am already collecting these. But this does not say anything about the partitions level. Best Regards, Kiran M K On Mon, Mar 16, 2020, 4:04 PM Léo FERLIN SUTTON wrote: > From what I see in the opscenter documentation you can probably get what > you want via the Ops Center Das

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Léo FERLIN SUTTON
jmx) >> > >> > You will have one metric per table, try to find the biggest one. You >> can find more info here : >> http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics >> > >> > On Mon, Mar 16, 2020 at 9:11 AM Kiran mk >&g

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Léo FERLIN SUTTON
tric per table, try to find the biggest one. You can > find more info here : > http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics > > > > On Mon, Mar 16, 2020 at 9:11 AM Kiran mk > wrote: > >> > >> Hi All, > >> > >>

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Kiran mk
t; I am trying to understand reads per sstables. How to find which >> table partitions having the more reads per sstables in Cassandra? >> >> >> -- >> Best Regards, >> Kiran.M.K. >> >>

Re: How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Léo FERLIN SUTTON
All, > > I am trying to understand reads per sstables. How to find which > table partitions having the more reads per sstables in Cassandra? > > > -- > Best Regards, > Kiran.M.K. > > - > T

How to find which table partitions having the more reads per sstables ?

2020-03-16 Thread Kiran mk
Hi All, I am trying to understand reads per sstables. How to find which table partitions having the more reads per sstables in Cassandra? -- Best Regards, Kiran.M.K. - To unsubscribe, e-mail: user-unsubscr

Re: oversized partition detection ? monitoring the partitions growth ?

2019-11-01 Thread Chris Lohfink
You can set compaction_large_partition_warning_threshold_mb and alert on logs . Writing large partition {}/{}:{} ({}) to sstable {} Chri

Re: oversized partition detection ? monitoring the partitions growth ?

2019-10-31 Thread Eric LELEU
Hi, I'm not sure that your are able to log which partition has reached 100MB but you may monitor the "EstimatedPartitionSizeHistogram"and take the max value (or 99ct, 95ct) to trigger an alert using your monitoring system. http://cassandra.apache.org/doc/latest/operating/metrics.html#table-me

oversized partition detection ? monitoring the partitions growth ?

2019-10-31 Thread jagernicolas
Hi, how can I detect a partition that reaches the 100MB ? is it possible to log the size of every partition one time per day ? regards, Nicolas Jäger

Re: What happens to empty partitions?

2019-05-17 Thread Carl Mueller
Eventually compaction will remove the row when the sstable is merged/rewritten. On Fri, May 17, 2019 at 8:06 AM Tom Vernon wrote: > Hi, I'm having trouble getting my head around what happens to a partition > that no longer contains any data. As TTL is applied at the column level > (but not on th

What happens to empty partitions?

2019-05-17 Thread Tom Vernon
Hi, I'm having trouble getting my head around what happens to a partition that no longer contains any data. As TTL is applied at the column level (but not on the primary key), if I insert all values with a TTL then all of those values will be tombstoned and eventually purged once they reach that TT

Re: Large partitions

2018-09-13 Thread Jonathan Haddad
On Thu, Sep 13, 2018 at 9:47 AM Mun Dega wrote: > I disagree. > > We had several over 150MB in 3.11 and we were able to break cluster doing > r/w from these partitions in a short period of time. > > On Thu, Sep 13, 2018, 12:42 Gedeon Kamga wrote: > >> Folks, &

Re: Large partitions

2018-09-13 Thread Mun Dega
I disagree. We had several over 150MB in 3.11 and we were able to break cluster doing r/w from these partitions in a short period of time. On Thu, Sep 13, 2018, 12:42 Gedeon Kamga wrote: > Folks, > > Based on the information found here > https://docs.datastax.com/en/dse-planning/

Re: Large partitions

2018-09-13 Thread Alexander Dejanovski
Hi Gedeon, you should check Robert Stupp's 2016 talk about large partitions : https://www.youtube.com/watch?v=N3mGxgnUiRY Cheers, On Thu, Sep 13, 2018 at 6:42 PM Gedeon Kamga wrote: > Folks, > > Based on the information found here > https://docs.datastax.com/en/dse-plan

Large partitions

2018-09-13 Thread Gedeon Kamga
*Write *is very slow because the partitions on some tables are over 100MB. I know for a fact that this rule has changed since 2.2. Starting Cassandra 2.2 and up, the new rule of thumb for partition size is *a few hundreds MB*, given the improvement on the architecture. Now, I am unable to find the

updating old partitions in STCS

2018-08-04 Thread onmstester onmstester
I read in some best practising documents on datam model that: do not update old partitions while using STCS. But i always use cluster keys in my queries and cqlsh-tracing reports that it only accesses sstables with data having specified cluster key (not all sstables containing part of partition

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
Should i run compaction after changing column_index_size_in_kb? Sent using Zoho Mail On Sun, 20 May 2018 15:06:57 +0430 onmstester onmstester wrote I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time, it even got wor

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time, it even got worse. Even increasing Key cache size and Row cache size did not help. Sent using Zoho Mail On Sun, 20 May 2018 08:52:03 +0430 Jeff Jirsa wrote Column in

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
Data spread between a SSD disk and a 15K disk. the table has 26 tables totally. I haven't try tracing, but i will and inform you! Sent using Zoho Mail On Sun, 20 May 2018 08:26:33 +0430 Jonathan Haddad wrote What disks are you using? How many sstables a

Re: Reading from big partitions

2018-05-19 Thread Jeff Jirsa
t data i end up with some hundred > partitions having more than 300MB size. Reading any sequence of data > from these partitions took about 5 seconds while reading from other > partitions (with less than 50MB sizes) took less than 10ms. > Since i can't change the data model in sake

Re: Reading from big partitions

2018-05-19 Thread Jonathan Haddad
What disks are you using? How many sstables are you hitting? Did you try tracing the request? On Sat, May 19, 2018 at 8:43 PM onmstester onmstester wrote: > Hi, > Due to some unpredictable behavior in input data i end up with some > hundred partitions having more than 300MB size. Re

Reading from big partitions

2018-05-19 Thread onmstester onmstester
Hi, Due to some unpredictable behavior in input data i end up with some hundred partitions having more than 300MB size. Reading any sequence of data from these partitions took about 5 seconds while reading from other partitions (with less than 50MB sizes) took less than 10ms. Since i can&#

Re: Large Partitions

2018-04-02 Thread shalom sagges
e partition' The message could be different for > the c* version you're using though. Plus, this doesn't show you all of the > large partitions. > > There is a nice tool that analyzes sstables and can show the large > partitions: > https://github.com/tolbertam/s

Re: Large Partitions

2018-04-02 Thread Ali Hubail
system.log should show you some warnings about wide rows. Do a grep on system.log for 'Writing large partition' The message could be different for the c* version you're using though. Plus, this doesn't show you all of the large partitions. There is a nice tool that analy

Large Partitions

2018-04-02 Thread shalom sagges
Hi All, I ran nodetool cfstats (v2.0.14) on a keyspace and found that there are a few large partitions. I assume that since "Compacted partition maximum bytes": 802187438 (~800 MB) and since "Compacted partition mean bytes": 100465 (~100 KB), it means that most partitions

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Carlos Rolo
idea to assign > a UUID to a clustering key, or would a timestamp be a better choice? I am > thinking that partitions need to keep some sort of binary index for the > clustering keys and for relatively large partitions it can be relatively > expensive to maintain. > > F Javier Par

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
to a table to minimize fragmentation and increase the speed of the insertions. In the Cassandra world, does the same apply to the clustering key? For example, is it a good idea to assign a UUID to a clustering key, or would a timestamp be a better choice? I am thinking that partitions need to keep

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Jeff Jirsa
thing I didn't > know and highly interesting to know more about! > > > We do a lot "by partition". We build column indexes by partition. We update the partition index on each partition. We invalidate key cache by partition. They're not super expensive, but they take tim

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
ctively a tuple of the token/hash and the > underlying key itself), so having more than 2^64 partitions won’t hurt > anything in theory > > That said, having that many partitions would be an incredibly huge data > set, and unless modeled properly, would be very likely to be unwie

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Carlos Rolo
lo On Wed, Mar 7, 2018 at 2:36 PM, Jeff Jirsa wrote: > There is no limit > > The token range of murmur3 is 2^64, but Cassandra properly handles token > overlaps (we use a key that’s effectively a tuple of the token/hash and the > underlying key itself), so having more than 2^64 pa

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Jeff Jirsa
There is no limit The token range of murmur3 is 2^64, but Cassandra properly handles token overlaps (we use a key that’s effectively a tuple of the token/hash and the underlying key itself), so having more than 2^64 partitions won’t hurt anything in theory That said, having that many

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Thank you Rahul, but is it a good practice to use a large range here? Or would it be better to create partitions with more than 1 row (by using a clustering key)? >From a data query point of view I will be accessing the rows by a UID one at a time. F Javier Pareja On Wed, Mar 7, 2018 at 11:12

Re: [External] Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Tom van der Woerdt
Hi Javier, When our users ask this question, I tend to answer "keep it above a billion". More partitions is better. I'm not aware of any actual limits on partition count. Practically it's almost always limited by the disk space in a server. Tom van der Woerdt Site

Re: Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Rahul Singh
The range is 2*2^63 -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 7, 2018, 6:06 AM -0500, Javier Pareja , wrote: > Hello all, > > I have been trying to find an answer to the following but I have had no luck > so far: > Is there any limit to the number of partitio

Is there any limit in the number of partitions that a table can have

2018-03-07 Thread Javier Pareja
Hello all, I have been trying to find an answer to the following but I have had no luck so far: Is there any limit to the number of partitions that a table can have? Let's say a table has a partition key an no clustering key, is there a recommended limit on the number of values that

Re: multiple tables vs. partitions and TTL

2018-02-01 Thread Alain RODRIGUEZ
y, other data for 3-5 years. > > The choice now is to have one table only with TTL per partition and > partitions per deletion month (when the data should be deleted) > which will allow a single delete command, followed by a compaction > or alternatively to have multiple tables (o

Re: multiple tables vs. partitions and TTL

2018-02-01 Thread James Shaw
3-5 years. > > The choice now is to have one table only with TTL per partition and > partitions per deletion month (when the data should be deleted) > which will allow a single delete command, followed by a compaction > or alternatively to have multiple tables (one per month whe

multiple tables vs. partitions and TTL

2018-02-01 Thread Marcus Haarmann
her data for 3-5 years. The choice now is to have one table only with TTL per partition and partitions per deletion month (when the data should be deleted) which will allow a single delete command, followed by a compaction or alternatively to have multiple tables (one per month when the delet

Re: TWCS on partitions spanning multiple time windows

2017-12-14 Thread Jeff Jirsa
ean it can’t purge out most of the expired data. Overlaps are almost always caused by read repair, not by partitions spanning windows, unless you’re writing some data with ttls and some without, writing with USING TIMESTAMP to write data that doesn’t match “now” > Scenario 2: > &

TWCS on partitions spanning multiple time windows

2017-12-14 Thread Hannu Kröger
Hi, I have been reading a bit about TWCS to understand how it functions. Current assumption: TWCS uses same tombstone checks as any other compaction strategy to make sure that it doesn’t remove tombstones unless it is safe to do so. Scenario 1: So let’s assume that I have a tables like this: C

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
>>>> > >>>> > Well, I’m guessing that Cassandra doesn't really know if the range >>>> > tombstone is useful for this or not. >>>> > >>>> > In many cases it might be that the partition contains data that is >>&g

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
gt;>>>> within the range of the tombstone but is newer than the tombstone and >>>>> therefore it might be still be returned. Scanning through deleted data can >>>>> be avoided by reading the partition in reverse (if all the deleted data is >>>>> in

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
Scanning through deleted data can be >>> > avoided by reading the partition in reverse (if all the deleted data is >>> > in the beginning of the partition). Eventually you will still end up >>> > reading a lot of tombstones but you will get a lot of live data first an

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
ewer than the tombstone and >>>> therefore it might be still be returned. Scanning through deleted data can >>>> be avoided by reading the partition in reverse (if all the deleted data is >>>> in the beginning of the partition). Eventually you will still end up >>

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
>>> reading a lot of tombstones but you will get a lot of live data first and >>> the implicit query limit of 1 probably is reached before you get to the >>> tombstones. Therefore you will get an immediate answer. >>> > >>> > Does it make sense?

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
; implicit query limit of 1 probably is reached before you get to the >> > tombstones. Therefore you will get an immediate answer. >> > >> > Does it make sense? >> > >> > Hannu >> > >> >> On 16 May 2017, at 16:33, Stefano Ortolani

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Nitan Kainth
can be >>> > avoided by reading the partition in reverse (if all the deleted data is >>> > in the beginning of the partition). Eventually you will still end up >>> > reading a lot of tombstones but you will get a lot of live data first and >>> > th

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
l still end up >>> reading a lot of tombstones but you will get a lot of live data first and >>> the implicit query limit of 1 probably is reached before you get to the >>> tombstones. Therefore you will get an immediate answer. >>> > >>> > Does it ma

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Nitan Kainth
ding >> > a lot of tombstones but you will get a lot of live data first and the >> > implicit query limit of 1 probably is reached before you get to the >> > tombstones. Therefore you will get an immediate answer. >> > >> > Does it make sense?

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
ediate answer. >> > >> > Does it make sense? >> > >> > Hannu >> > >> >> On 16 May 2017, at 16:33, Stefano Ortolani wrote: >> >> >> >> Hi all, >> >> >> >> I am seeing inconsistencies

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
t an immediate answer. > > > > Does it make sense? > > > > Hannu > > > >> On 16 May 2017, at 16:33, Stefano Ortolani >> <mailto:ostef...@gmail.com>> wrote: > >> > >> Hi all, > >> > >> I am seeing inconsistencies whe

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
t; >> Hannu >> >>> On 16 May 2017, at 16:33, Stefano Ortolani wrote: >>> >>> Hi all, >>> >>> I am seeing inconsistencies when mixing range tombstones, wide partitions, >>> and reverse iterators. >>> I still

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
t; > > Does it make sense? > > > > Hannu > > > >> On 16 May 2017, at 16:33, Stefano Ortolani wrote: > >> > >> Hi all, > >> > >> I am seeing inconsistencies when mixing range tombstones, wide > partitions, and reverse iterators.

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Nitan Kainth
e sense? > > Hannu > >> On 16 May 2017, at 16:33, Stefano Ortolani wrote: >> >> Hi all, >> >> I am seeing inconsistencies when mixing range tombstones, wide partitions, >> and reverse iterators. >> I still have to understand if the behaviour is to

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
ached before you get to the tombstones. Therefore you will get an immediate answer. Does it make sense? Hannu > On 16 May 2017, at 16:33, Stefano Ortolani wrote: > > Hi all, > > I am seeing inconsistencies when mixing range tombstones, wide partitions, > and reverse itera

Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
Hi all, I am seeing inconsistencies when mixing range tombstones, wide partitions, and reverse iterators. I still have to understand if the behaviour is to be expected hence the message on the mailing list. The situation is conceptually simple. I am using a table defined as follows: CREATE

Most used token or partitions

2017-03-22 Thread D. Salvatore
Hi, I am looking for a way to retrieve some statistic about the usage of the data on my Cassandra cluster. Ideally I would like to retrieve a list of the most used token ranges or partitions over the time through JMX or other similar ways. I found that nodetool as the option "toppartitio

Re: Problems with large partitions and compaction

2017-02-15 Thread Dan Kinder
What Cassandra version? CMS or G1? What are your timeouts set to? "GC activity" - Even if there isn't a lot of activity per se maybe there is a single long pause happening. I have seen large partitions cause lots of allocation fast. Looking at SSTable Levels in nodetool cfstats

Problems with large partitions and compaction

2017-02-14 Thread John Sanda
I have a table that uses LCS and has wound up with partitions upwards of 700 MB. I am seeing lots of the large partition warnings. Client requests are subsequently failing. The driver is not reporting timeout exception, just NoHostAvailableExceptions (in the logs I have reviewed so far). I know

Re: understanding partitions

2016-09-23 Thread Firdousi Farozan
Hi, One more thing to consider is wide partition. Even though theoretically Cassandra supports wide rows, practical limit is max 100 MB per partition. So based on your use-case and model, you may have to split the data into partitions so that wide partitions are not created. Regards, Firdousi

Re: understanding partitions and # of nodes

2016-09-22 Thread Jens Rantil
By "partitions" I assume you refer to "partition keys". Generally, the more partitions keys, the better. Having more partition keys means your data generally is spread out more evenly across the cluster, makes repairs run faster (or so I've heard), makes adding new nodes

Re: understanding partitions and # of nodes

2016-09-21 Thread Jeff Jirsa
It If you only have 100 partitions, then having more than (100 * RF) nodes doesn’t help you much. However, unless you’re using very specific partitioners, there’s no guarantee that you’ll have 1 partition per node (with 10 nodes / 10 partitions). Cassandra uses murmur3 hash (by default

understanding partitions and # of nodes

2016-09-21 Thread S Ahmed
Hello, If you have a 10 node cluster, how does having 10 partitions or 100 partitions change how cassandra will perform? With 10 partitions you will have 1 partition per node. WIth 100 partitions you will have 10 partitions per node. With 100 partitions I guess it helps because when you add

understanding partitions

2016-09-21 Thread S Ahmed
Hello, If you have a 10 node cluster, how does having 10 partitions or 100 partitions change how cassandra will perform? With 10 partitions you will have 1 partition per node. WIth 100 partitions you will have 10 partitions per node. With 100 partitions I guess it helps because when you add

Spark partitions from CassandraRDD

2015-09-02 Thread Alaa Zubaidi (PDF)
will generate few partitions. However, I can ONLY see 1 partition. I cached the CassandraRDD and in the UI storage tab it shows ONLY 1 partition. Any idea, why I am getting 1 partition? Thanks, Alaa -- *This message may contain confidential and privileged information. If it has been sent to

Re: RDD partitions per executor in Cassandra Spark Connector

2015-03-03 Thread Carl Yeksigian
x27;t find the *issues* button on > https://github.com/datastax/spark-cassandra-connector/ so posting here. > > Any one have an idea why token ranges are grouped into one partition per > executor? I expected at least one per core. Any suggestions on how to work > around this? Doing a repartit

Re: RDD partitions per executor in Cassandra Spark Connector

2015-03-03 Thread Pavel Velikhov
ed at least one per core. Any suggestions on how to work > around this? Doing a repartition is way to expensive as I just want more > partitions for parallelism, not reshuffle ... > > Thanks in advance! > Frens Jan

RDD partitions per executor in Cassandra Spark Connector

2015-03-02 Thread Rumph, Frens Jan
ing a repartition is way to expensive as I just want more partitions for parallelism, not reshuffle ... Thanks in advance! Frens Jan

Re: data distribution along column family partitions

2015-02-04 Thread Marcelo Valle (BLOOMBERG/ LONDON)
From: clohfin...@gmail.com Subject: Re: data distribution along column family partitions > not ok :) don't let a single partition get to 1gb, 100's of mb should be when > flares are going up. The main reasoning is compactions would be horrifically > slow and there will

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
cate an entire (large/wide) partition into memory unless your telling it to on a read. (gross simplification coming up here) Can think of it as if more as if its streaming the partitions data from disk (more or less) filling a response to your query. Don't ask for 1gb of data and you won'

Re: data distribution along column family partitions

2015-02-04 Thread Marcelo Valle (BLOOMBERG/ LONDON)
> The data model lgtm. You may need to balance the size of the time buckets > with the amount of alarms to prevent partitions from getting too large. 1 month may be a little large, I would aim to keep the partitions below 25mb (can check with nodetool cfstats) or so in size to keep ever

Re: data distribution along column family partitions

2015-02-04 Thread Chris Lohfink
The data model lgtm. You may need to balance the size of the time buckets with the amount of alarms to prevent partitions from getting too large. 1 month may be a little large, I would aim to keep the partitions below 25mb (can check with nodetool cfstats) or so in size to keep everything happy

data distribution along column family partitions

2015-02-04 Thread Marcelo Elias Del Valle
nother question is: would data be distributed enough if I just choose to partition by user-id? I will have some users with a large number of alerts, but in average I could consider alerts would have a good distribution along user ids. The problem is I don't fell confident having few partitions wi

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-06 Thread Eric Stevens
for some multiple of your typical read ranges (eg, if you typically would query for all objects within a day, bucket might be 1 or 2 hours, if you typically query by hour, perhaps bucket is 10 minutes, etc.). Practically speaking, depending on your hardware you'll want to try to keep your partiti

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread DuyHai Doan
o narrow down the set > of SSTables it needs to read if you request a specific clustering column > value. However, in your example, this isn't likely to narrow things down > much, so it will have to check many more SSTables. > > >> >> It’s not clear to me which will

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Tyler Hobbs
ample, this isn't likely to narrow things down much, so it will have to check many more SSTables. > > It’s not clear to me which will fit more efficiently on disk, but I > would guess that table a wins. > They're probably close enough not to matter very much. > > S

Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Robert Wille
At the data modeling class at the Cassandra Summit, the instructor said that lots of small partitions are just fine. I’ve heard on this list that that is not true, and that its better to cluster small partitions into fewer, larger partitions. Due to conflicting information on this issue, I’d be

Ordering between partitions.

2014-10-10 Thread Aleksandar Stojadinovic
ed in different partitions in an ordered manner. Using a Java client it looks like it is not really possible. Should we order the values in code or select all the primary keys with an IN keyword (but it seems we lose the pagination option then)? Is there a common pattern for this situation? All in all, t

Re: unreadable partitions

2014-09-29 Thread Robert Coli
On Sun, Sep 28, 2014 at 3:45 AM, tommaso barbugli wrote: > I see some data stored in Cassandra (2.0.7) being not readable from CQL; > this affects entire partitions, querying this partitions raise a Java > exception: > If the SSTable is not corrupt but is not readable via CQL and

unreadable partitions

2014-09-28 Thread tommaso barbugli
Hi, I see some data stored in Cassandra (2.0.7) being not readable from CQL; this affects entire partitions, querying this partitions raise a Java exception: ERROR [ReadStage:540638] 2014-09-28 12:40:38,992 CassandraDaemon.java (line 198) Exception in thread Thread[ReadStage:540638,5,main

Re: Is there anyone who implemented time range partitions with column families?

2013-06-17 Thread Robert Coli
On Wed, May 29, 2013 at 9:33 AM, Hiller, Dean wrote: > QUESTION: I am assuming 10 compactions should be enough to put enough load > on the disk/cpu/ram etc. etc. or do you think I should go with 100CF's. > 98% of our data is all in this one CF. Compaction can only really efficiently multi-thread

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
Wednesday, May 29, 2013 10:01 AM >To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >mailto:user@cassandra.apache.org>> >Subject: Re: Is there anyone who implemented time range partitions with >column families? > >Thank you very much for the fast answer

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Jabbar Azam
in time period (rather than using C*'s TTL), and write data >> to the different CFs as needed. >> >> ~Jeremy >> >> On Wed, May 29, 2013 at 8:36 AM, cem wrote: >> >>> Hi All, >>> >>> I used time range partitions 5 years ago

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Is there anyone who implemented time range partitions with column families? Thank you very much for the fast answer. Does playORM use different column families for each partition in Cassand

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread cem
er than using C*'s TTL), and write data > to the different CFs as needed. > > ~Jeremy > > On Wed, May 29, 2013 at 8:36 AM, cem wrote: > >> Hi All, >> >> I used time range partitions 5 years ago with MySQL to clean up data much >> faster. >> >

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Jeremy Powell
our system, programmatically drop CFs if/when they are outside a certain time period (rather than using C*'s TTL), and write data to the different CFs as needed. ~Jeremy On Wed, May 29, 2013 at 8:36 AM, cem wrote: > Hi All, > > I used time range partitions 5 years ago with MySQL to

Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread cem
Hi All, I used time range partitions 5 years ago with MySQL to clean up data much faster. I had a big FACT table with time range partitions and it was very is to drop old partitions (with archiving) and do some saving on disk. Has anyone implemented such a thing in Cassandra? It would be great

Nodetool move failure, no data partitions determined

2011-12-25 Thread RobinUs2
I was moving around some nodes in my cluster but when I get one node there appears an error: "Error during move: The data partitions for node [IP] have not been determined" How to solve this problem? -- View this message in context: http://cassandra-user-incubator-apache-org.

Partitions

2010-12-24 Thread David G. Boney
I am using the Hadoop interface with Cassandra. Is it possible to line up partitions or splits of two different column families to be on the same node? I am doing this for data locality reasons. I want to read all the data from a split of column family A and a split from column family B into