support for nulls in composite lost in CQL3

2013-11-19 Thread Hiller, Dean
We have wide rows which are composite of integer.byte array where some of our columns are {empty}.byte array (ie. The first part of the composite key is empty as in 0 length string or 0 length integer(ie. NOT 0, but basically null) This has worked great when we look up all the entries with a emp

Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
ta) and ORM Thanks Dean. I'll check that page out. Les On Wed, Oct 23, 2013 at 7:52 AM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: PlayOrm supports different types of wide rows like embedded list in the object, etc. etc. There is a list of nosql patterns mixed with playorm pa

Re: Wide rows (time series data) and ORM

2013-10-23 Thread Hiller, Dean
PlayOrm supports different types of wide rows like embedded list in the object, etc. etc. There is a list of nosql patterns mixed with playorm patterns on this page http://buffalosw.com/wiki/patterns-page/ From: Les Hartzman mailto:lhartz...@gmail.com>> Reply-To: "user@cassandra.apache.org

Re: Online shop with Cassandra

2013-10-09 Thread Hiller, Dean
Read the paper "Building on Quicksand" especially the section where he describes what they do at AmazonŠthe apology modelŠie. Allow overbooking and apologize but limit overbookingŠ.That is one way to go and stay scalable. You may want to analyze the percentage change that overbooking can be as we

Re: How many Column Families can Cassandra handle?

2013-09-26 Thread Hiller, Dean
600 is probably doable but each CF takes up memory……PlayOrm goes with a strategy that can virtualize CF's into one CF allowing less memory usage….we have 80,000 virtual CF's in cassandra through playorm….you can copy playorm's pattern if desired. But 600 is probably doable but high. 10,000 is

Re: is this correct, thrift unportable to CQL3Š.

2013-09-24 Thread Hiller, Dean
value. Even if you don't want to use a prepared statement, CQL3 has conversion functions (http://cassandra.apache.org/doc/cql3/CQL.html#blobFun) that allows to do it (for instance, "blobAsInt(0x)" will be an empty int value). -- Sylvain On Tue, Sep 24, 2013 at 2:36 PM, Hiller, De

is this correct, thrift unportable to CQL3Š.

2013-09-24 Thread Hiller, Dean
Many applications in thrift use the wide row with composite column name and as an example, let's say golf score for instance and we end up with golf score : pk like so null : pk56 null : pk45 89 : pk90 89: pk87 90: pk101 95: pk17 Notice that there are some who do not have a golf score(zero woul

composite with null prefix in CQL3(porting from thrift)

2013-09-23 Thread Hiller, Dean
I ran into this same issue on this stackoverflow post… http://stackoverflow.com/questions/18963248/how-can-i-have-null-column-value-for-a-composite-key-column-in-cql3 Does anyone know how to have the same composite column name pattern that enables wide rows with a null value? Ie. We had some in

Re: Reverse compaction on 1.1.11?

2013-09-19 Thread Hiller, Dean
Can ou describe what you mean by reverse compaction? I mean once you put a row together and blow away sstables that contained it before, you can't possibly know how to split it since that information is gone. Perhaps you want the simple sstable2json script in the bin directory so you can inspect

hadoop 12 T recommendation vs. cassandra 1T recommendation

2013-09-18 Thread Hiller, Dean
This article looks like it came out just one month ago or not even http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ And recommends 12-24 1-4TB disks in a JBOD configuration. I know hadoop is used a lot in analytics but can also be used in some s

Re: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy ?

2013-09-18 Thread Hiller, Dean
number vary widely based on differences in underlying hardware, or would you say from experience that something around 50M for medium to large datasets ( with upped file-descriptor limits ) is safe for most medium-sized (1 - 5 TB per node) to high-end (hundreds of TB) hardware ? On Wed, Sep 18, 201

Re: What is the ideal value for sstable_size_in_mb when using LeveledCompactionStrategy ?

2013-09-18 Thread Hiller, Dean
1. Always in cassandra up your file descriptor limits on linux and even in 0.7 that was the recommendation so cassandra could open tons of files 2. We use 50M for our LCS with no performance issues. We had it 10M on our previous with no issues but a huge amount of files of course with our 30

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
gone..no heap dump, no log info > > This shouldn't happen if you have swap active in the server > > On Wednesday, September 18, 2013, Franc Carter wrote: > > A random guess - possibly an OOM (Out of Memory) where Linux will kill a > process to recover memory when it is

Revisit with another spin: is there any type of table existing on all nodes?

2013-09-18 Thread Hiller, Dean
er@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)? You could create a bunch of 1 node DCs if you really wanted it. On Fri, Sep 13, 2013 at

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
output of the output of dmesg cheers On Wed, Sep 18, 2013 at 10:21 PM, Hiller, Dean > wrote: Anyone know how to debug cassandra processes just exiting? There is no info in the cassandra logs and there is no heap dump file(which in the past has shown up in /opt/cassandra/bin directory fo

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
it is desperately low on memory. Have a look in either your syslog output of the output of dmesg cheers On Wed, Sep 18, 2013 at 10:21 PM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: Anyone know how to debug cassandra processes just exiting? There is no info in the cassandra logs a

cassandra just gone..no heap dump, no log info

2013-09-18 Thread Hiller, Dean
Anyone know how to debug cassandra processes just exiting? There is no info in the cassandra logs and there is no heap dump file(which in the past has shown up in /opt/cassandra/bin directory for me). This occurs when running a map/reduce job that put severe load on the system. The logs look

Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
Netflix created file streaming in astyanax into cassandra specifically because writing too big a column cell is a bad thing. The limit is really dependent on use case….do you have servers writing 1000's of 200Meg files at the same time….if so, astyanax streaming may be a better way to go there

Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
You have to first understand the rules of 1. Sstables are immutable so Color-1-Data.db will not be modified and only deleted once compacted 2. Memtables are flushed when reaching a limit so if Blue:{hex} is modified, it is done in the in-memory memtable that is eventually flushed 3. Once f

Re: questions related to the SSTable file

2013-09-17 Thread Hiller, Dean
You may want to be careful as column 1 could be stored in both files until compaction as well when column 1 has encountered changes and cassandra returns the latest column 1 version but two sstables contain column 1. (At least that is the way I understand it). Later, Dean From: "Takenori Sato

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
of 1 node DCs if you really wanted it. On Fri, Sep 13, 2013 at 12:29 PM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: Actually, I have been on a few projects where something like that is useful. Gemfire(a grid memory cache) had that feature which we used at another company. O

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
an 11 node cluster it would be quorum reads / writes would need to >come from 6 nodes. It would probably be much slower for both reads & >writes. > >It sounds like what you want is a database with replication, not >partitioning. > >On Sep 13, 2013, at 11:15 AM, "Hi

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
che.org>> Date: Friday, September 13, 2013 12:06 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)? On Fri, Sep 1

is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Hiller, Dean
I was just wondering if cassandra had any special CF that every row exists on every node for smaller tables that we would want to leverage in map/reduce. The table row count is less than 500k and we are ok with slow updates to the table, but this would make M/R blazingly fast since for every ro

Re: map/reduce performance time and sstable readerŠ.

2013-09-03 Thread Hiller, Dean
We are considering creating our own InputFormat for hadoop and running the tasktrackers on every 3rd node(ie. RF=3) such that we cover all ranges. Our M/R overhead appears to be 13 days vs. 12.5 hours on just reading SSTAbles directly on our current data set. I personally don't think parsing

is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-08-30 Thread Hiller, Dean
is there a SSTableInput for Map/Reduce instead of ColumnFamily (which uses thrift)? We are not worried about repeated reads since we are idempotent but would rather have the direct speed (even if we had to read from a snapshot, it would be fine). (We would most likely run our M/R on 4 nodes of

map/reduce performance time and sstable readerŠ.

2013-08-30 Thread Hiller, Dean
Has anyone done performance tests on sstable reading vs. M/R? I did a quick test on reading all SSTAbles in a LCS column family on 23 tables and took the average time it took sstable2json(to /dev/null to make it faster) which was 7 seconds per table. (reading to stdout took 16 seconds per tabl

Re: node dead after restart

2013-08-22 Thread Hiller, Dean
Isn't this the log file from 10.0.0.146??? And this 10.0.0.146 sees that 10.0.0.111 is up, then sees it dead and in the log we can see it bind with this line INFO 12:16:23,108 Binding thrift service to ip-10-0-0-146.ec2.internal/10.0.0.146:9160 What is the log file look

Re: Secondary Index Question

2013-08-21 Thread Hiller, Dean
can return the desired results ? > > >-----Original Message- >From: Hiller, Dean [mailto:dean.hil...@nrel.gov] >Sent: 21 August 2013 07:36 >To: user@cassandra.apache.org >Subject: Re: Secondary Index Question > >Yup, there are other types of indexing like that in PlayOrm whi

Re: Secondary Index Question

2013-08-21 Thread Hiller, Dean
> > >-Original Message- >From: Hiller, Dean [mailto:dean.hil...@nrel.gov] >Sent: 21 August 2013 07:36 >To: user@cassandra.apache.org >Subject: Re: Secondary Index Question > >Yup, there are other types of indexing like that in PlayOrm which do it >differently so all n

Re: Secondary Index Question

2013-08-21 Thread Hiller, Dean
Yup, there are other types of indexing like that in PlayOrm which do it differently so all nodes are not hit so it works better for instance if you are partitioning your data and you query into just a single partition so it doesn't put load on all the nodes. (of course, you have to have a parti

Re: make cassandra-cli use 7197 for JMX instead?

2013-07-29 Thread Hiller, Dean
Ugh, how did I miss that one, it was in the cassandra-cli --helpŠ.never mind. Dean On 7/29/13 11:24 AM, "Hiller, Dean" wrote: >I start nodetool with > >Cassandra-cli ­p 9158 but it gives warnings about not displaying all >information because my JMX port is on 7197 inste

make cassandra-cli use 7197 for JMX instead?

2013-07-29 Thread Hiller, Dean
I start nodetool with Cassandra-cli –p 9158 but it gives warnings about not displaying all information because my JMX port is on 7197 instead of 7199 which is in use by another process. How do I make cassandra-cli connect to 7197 for the JMX stuff (right now it connets into another process whi

hadoop/cassandra integration using CL_ONE...

2013-07-26 Thread Hiller, Dean
Is it possible to use CL_ONE with hadoop/cassandra when doing an M/R job? And more importantly is there a way to configure that such that if my RF=3, that it only reads from 1 of the nodes in that 3. We have 12 nodes and ideally we would for example hope M/R runs on a2, a9, a5, a12 which happen

Re: cassandra 1.2.6 -> Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Oh, and in the past 0.20.x has been pretty stable by the wayŠ..they finally switched their numbering scheme thank god. Dean On 7/23/13 2:13 PM, "Hiller, Dean" wrote: >Perhaps try 0.20.2 as > > 1. The maven pom files have cassandra depending on 0.20.2 > 2. The 0.20.2 def

Re: cassandra 1.2.6 -> Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
t problems when inserting data, even stopping Cassandra, cleaning my entire data folder and then starting it again. I am also really curious to know if there is anyone else having these problems or if it is just me... Best regards, Marcelo. 2013/7/23 Hiller, Dean mailto:dean.hil...@nrel.gov>

Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Hiller, Dean
write load and I increase my cluster size 10 fold, it may be the case that I am limited by CPU as the memtable flushes are not happening as often since writes are thinned out across the cluster. Dean On 7/23/13 12:12 PM, "Hiller, Dean" wrote: >Out of curiosity, isn't what is r

Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Hiller, Dean
Out of curiosity, isn't what is really happening is this "As writes keep coming in, memory fills up causing flushes to the commit log disk of the whole memtable. In a bursting scenario, writes are thus limited only by memory and cpu in short bursting cases that tend to fit in memory. In a mor

Re: About column family

2013-07-23 Thread Hiller, Dean
We use PlayOrm to have 60,000 VIRTUAL column families such that the performance is just fine ;). You may want to try something like that. Dean From: Robert Coli mailto:rc...@eventbrite.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.or

Re: cassandra 1.2.6 -> Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Out of curiosity, what version of hadoop are you using with cassandra? I think we are trying 0.20.2 if I remember(I have to ask my guy working on it to be sure). I do remember him saying the cassandra maven dependency was odd in that it is in the older version and not a newer hadoop version.

Re: too many open files

2013-07-15 Thread Hiller, Dean
I believe too many open files is really too many open file descriptors so you may want to check number of sockets open as well to see if you hit the open file descriptor limit. Sockets open a descriptor and count toward the limit I believe….I am quite rusty in this and this is from my bad memor

Re: temporarily running a cassandra side by side in production

2013-07-12 Thread Hiller, Dean
- Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 11/07/2013, at 11:37 AM, "Hiller, Dean" mailto:dean.hil...@nrel.gov>> wrote: We have a 12 node production cluster and a 4 node QA cluster. We are starting to think we are

temporarily running a cassandra side by side in production

2013-07-10 Thread Hiller, Dean
We have a 12 node production cluster and a 4 node QA cluster. We are starting to think we are going to try to run a side by side cassandra instance in production while we map/reduce from one cassandra into the new instance. We are intending to do something like this Modify all ports in cassan

playORM version 1.6 released

2013-07-08 Thread Hiller, Dean
Another new release is up in maven repos… - Astyanx is upgraded to 1.56.42 - Hbase support is almost done(barring those few test cases) - And following issues are fixed: Thanks to snazy and hsn10 :) https://github.com/deanhiller/playorm/issues/80 https://github.com/deanhiller/playorm/issues/81 h

column sort order and reversed sort performance question

2013-07-03 Thread Hiller, Dean
We loaded 5 million columns into a single row and when accessing the first 30k and last 30k columns we saw no performance difference. We tried just loading 2 rows from the beginning and end and saw no performance difference. I am sure reverse sort is there for a reason though. In what context

Re: How to do a CAS UPDATE on single column CF?

2013-07-01 Thread Hiller, Dean
What does CAS stand for? And is that the row locking feature like hbase's setAndReadWinner that you give the previous val and next val and your next val is returned if you won otherwise the current result is returned and you know some other node won? Thanks, Dean On 7/1/13 12:09 PM, "Blair Zajac"

Re: 10,000s of column families/keyspaces

2013-07-01 Thread Hiller, Dean
Oh and if you are using STCS, I don't think the below is an issue at all since that can run in parallel if needed already. Dean On 7/1/13 10:24 AM, "Hiller, Dean" wrote: >We use playorm to do 80,000 virtual column families(a playorm feature >though the pattern could be

Re: 10,000s of column families/keyspaces

2013-07-01 Thread Hiller, Dean
We use playorm to do 80,000 virtual column families(a playorm feature though the pattern could be copied). We did find out later and we are working on this now that we wanted to map 80,000 virtual CF's into 10 real CF's so leveled compaction can run more in parallel though or else we get stuck

Re: NREL has released open source Databus on github for time series data

2013-06-25 Thread Hiller, Dean
h the time series data ? I had a quick look at the links and could not see anything. Cheers Aaron - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/06/2013, at 2:51 AM, "Hiller, Dean" mailto:dean.hil...@nrel.gov&

Re: sorting columns by time

2013-06-24 Thread Hiller, Dean
Send the naming scheme you desire. Is long time since epoch ok? Or a composite name of time since epoch + (something else) Dean From: Bill Hastings mailto:bllhasti...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: M

Re: AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
o:rc...@eventbrite.com>> To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Sent: Monday, June 24, 2013 10:34 AM Subject: Re: AssertionError: Unknown keyspace? On Mon, Jun 24, 2013 at 6:04 AM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: > Oh shoot, this is a seed node. Is

quick question on seed nodes configuration

2013-06-24 Thread Hiller, Dean
For ease of use, we actually had a single cassandra.yaml deployed to every machine and a script that swapped out the token and listen address. I had seed nodes ip1,ip2,ip3 as the seeds but what I didn't realize was then that these nodes had themselves as seeds. I am assuming that should never

Re: AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
the unknown keyspace errors :( but it is bootstrapping now) and I assume I can add node B back once all the data is in there. Thanks, Dean On 6/24/13 6:55 AM, "Hiller, Dean" wrote: >Ah, so digging deeper, it is not bootstrapping. How do I force the node >to bootstrap? (this is v

Re: AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
auto bootstrap is true according to this log DEBUG 06:53:03,411 setting auto_bootstrap to true OR better yet, if someone can point me to the code on where bootstrap is decided so I can see why it decides not to bootstrap? Thanks, Dean On 6/24/13 6:42 AM, "Hiller, Dean" wrote: >I

AssertionError: Unknown keyspace?

2013-06-24 Thread Hiller, Dean
I haven't seen this error in a long time. We just received the below error in production when rebuilding a node…any ideas on how to get around this? We had rebuilt 3 other nodes already I think(we have been swapping hardware) ERROR 06:32:21,474 Exception in thread Thread[ReadStage:1,5,main] ja

Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-24 Thread Hiller, Dean
We would be very very interested in your results. We currently run 10M but have heard of 256M sizes as well. Please let us know what you find out. Thanks, Dean From: Andrew Bialecki mailto:andrew.biale...@gmail.com>> Reply-To: "user@cassandra.apache.org" mail

NREL has released open source Databus on github for time series data

2013-06-21 Thread Hiller, Dean
NREL has released their open source databus. They spin it as energy data (and a system for campus energy/building energy) but it is very general right now and probably will stay pretty general. More information can be found here http://www.nrel.gov/analysis/databus/ The source code can be fou

Re: Unit Testing Cassandra

2013-06-19 Thread Hiller, Dean
For unit testing, we actually use PlayOrm which has an in-memory version of nosql so we just write unit tests against our code which uses the in-memory version but that is only if you are in java. Later, Dean From: Shahab Yunus mailto:shahab.yu...@gmail.com>> Reply-To: "user@cassandra.apache.org

Re: Large number of files for Leveled Compaction

2013-06-17 Thread Hiller, Dean
My bet is 5MB is the low end since many people go with the default. We upped it to 10MB as at that time no one knew of what size was a good size to be and the default was only 5MB. Dean From: Franc Carter mailto:franc.car...@sirca.org.au>> Reply-To: "user@cassandra.apache.org

Re: headed to cassandra conference next week in San Fran?

2013-06-10 Thread Hiller, Dean
gt;you are >using cassandra for and how it's working for you. > >I'm a software engineer at Quantcast and we're just beginning to use >cassandra. >So far it's been great, but there's still a lot to learn in this space. > >See you at the conference, hopef

headed to cassandra conference next week in San Fran?

2013-06-07 Thread Hiller, Dean
I would not mind meeting people there. My cell is 303-517-8902, best to text me probably or just email me at d...@alvazan.com. Later, Dean

Re: Consistency level for multi-datacenter setup

2013-06-03 Thread Hiller, Dean
you find out it would be extremely helpful. There is also this property that you can play with to take care of slow nodes dynamic_snitch_badness_threshold. http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold Thanks ! On Mon, Jun 3, 2013 at 3:24 PM,

Re: Consistency level for multi-datacenter setup

2013-06-03 Thread Hiller, Dean
Also, we had to put a fix into cassandra so it removed "slow nodes" from the list of nodes to read from. With that fix our QUOROM(not local quorom) started working again and would easily take the other DC nodes out of the list of reading from for you as well. I need to circle back to with my t

Re: Consistency level for multi-datacenter setup

2013-06-03 Thread Hiller, Dean
What happens when you use CL=TWO. Dean From: srmore mailto:comom...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Monday, June 3, 2013 2:09 PM To: "user@cassandra.apache.org" mailto:

Re: Bulk loading into CQL3 Composite Columns

2013-05-31 Thread Hiller, Dean
Another option is not having it part of the primary key and using PlayOrm to query but to succeed and scale, you would need to also use PlayOrm partitions and then you can query in the partition and sort stuff. Dean From: Daniel Morton mailto:dan...@djmorton.com>> Reply-To: "user@cassandra.apac

Re: how to handle join properly in this case

2013-05-29 Thread Hiller, Dean
ld be appreciated. Thanks! > >On Tue, May 28, 2013 at 11:39 AM, Hiller, Dean >wrote: >> Another option is joins on partitions to keep the number of stuff >>needing >> to join relatively small. PlayOrm actually supports joins of partition >>1 >> of table A with

Re: random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Hiller, Dean
.org>> Date: Wednesday, May 29, 2013 10:51 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: random thoughts for MUCH faster key lookup in cassandra How would you implement range queries? On 29 Ma

random thoughts for MUCH faster key lookup in cassandra

2013-05-29 Thread Hiller, Dean
We recently ran into too much data in one CF because LCS can't really run in parallel on one CF in a single tier which got me thinking, why doesn't the CF directoy have 100 or 1000 directories 0-999 and cassandra hash the key to which directory it would go in and then put it in one of the sstabl

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
data9 such that we could be running 10 compactions in parallel. QUESTION: I am assuming 10 compactions should be enough to put enough load on the disk/cpu/ram etc. etc. or do you think I should go with 100CF's. 98% of our data is all in this one CF. Thanks, Dean On 5/29/13 10:06 AM, "

Re: Is there anyone who implemented time range partitions with column families?

2013-05-29 Thread Hiller, Dean
Nope, partitioning is done per CF in PlayOrm. Dean From: cem mailto:cayiro...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday, May 29, 2013 10:01 AM To: "user@cassandra.apache.org

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Oh and yes, astyanax uses client side response latency and cassandra does the same as a client of the other nodes. Dean On 5/28/13 2:23 PM, "Hiller, Dean" wrote: >Actually, we did a huge investigation into this on astyanax and cassandra. > Astyanax if I remember worked if conf

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
website but this no longer happens to us such that when compaction kicks off on a single node, our cluster keeps going strong. Dean On 5/28/13 2:12 PM, "Dwight Smith" wrote: >How do you determine the slow node, client side response latency? > >-Original Message-----

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
hput to unlimited and it didnt help that much. Disk size just keeps on growing but I know that there is enough space to store 1 day data. What do you think about time rage partitioning? Creating new column family for each partition and drop when you know that all records are expired. I have 5 node

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Also, how many nodes are you running? From: cem mailto:cayiro...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Tuesday, May 28, 2013 1:17 PM To: "user@cassandra.apache.org" mailto:use

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
You said compaction can't keep up. Are you manually running compaction all the time or just letting cassandra kick off compactions when needed? Is compaction always 100% running or are you saying your disk is growing faster than you like and would like compactions to be always 100% running? (

Re: data clean up problem

2013-05-28 Thread Hiller, Dean
Don't do any delete != "need to free the disk space after retention period" which you have in both your emails. My understanding is TTL is an expiry and just like tombstones will only be really deleted upon a compaction(ie. You do have deletes via TTL from the sound of it). If you have TTL of

Re: how to handle join properly in this case

2013-05-28 Thread Hiller, Dean
Another option is joins on partitions to keep the number of stuff needing to join relatively small. PlayOrm actually supports joins of partition 1 of table A with partition X of table B. You then just keep the number of rows in each partition at less than millions and you can filter with the wher

Re: weird token ownerships

2013-05-28 Thread Hiller, Dean
and the data exists on nodes a2, a3, and a4 but not on a1. You can see us inserting node a7 between a1 and a2, and inserting node a8 between node a2 and a3, etc. etc. Thanks, Dean On 5/28/13 8:46 AM, "Hiller, Dean" wrote: >I was assuming my node a1 would always own token 0, but we

weird token ownerships

2013-05-28 Thread Hiller, Dean
I was assuming my node a1 would always own token 0, but we just added 5 of 6 more nodes and a1 no longer owns that token range. I have a few questions on the table at the bottom 1. Is this supposed to happen where host a1 no longer owns token range 0(but that is in his cassandra.yaml file), b

Re: Using CQL to insert a column to a row dynamically

2013-05-27 Thread Hiller, Dean
Wide rows, dynamic columns are still possible in CQL3. There are some links here http://comments.gmane.org/gmane.comp.db.cassandra.user/30321 Also, there are other advantages to noSQL, not just schemaless aspect such as that it can accept tons of writes and you can scale the writes(you can't do

changing ips on node replacement

2013-05-24 Thread Hiller, Dean
I seem to remember problems with ghost nodes, etc. and I seem to remember if you are replacing a node and you don’t use the same ip, this can cause issues. Is this correct? We would like the new node to keep the same token, and the same host name but are wondering if we can change the ip since

Re: exception causes streaming to hang forever

2013-05-24 Thread Hiller, Dean
>On Fri, May 24, 2013 at 6:56 AM, Hiller, Dean >wrote: >> The exception on that node was just this >> >> ERROR [Thread-6056] 2013-05-22 14:47:59,416 CassandraDaemon.java (line >> 132) Exception in thread Thread[Thread-6056,5,main]

found the issue on bootstrap streaming hang

2013-05-24 Thread Hiller, Dean
For anyone else that might be interested, when the stream hangs, there is no exceptions around that time frame as to what exactly happened and why it hung(there is an exception just not informative at all). We did find other exceptions that we "thought" were unrelated though days before. We fo

corrupt sstable

2013-05-24 Thread Hiller, Dean
We have a corrupt sstable databus5-nreldata-ib-36763-Data.db. How do we safely blow this away? (and then we would run repair to make sure all data is still there)… Can we just move the file out from under cassandra? (or would cassandra freak out?) Thanks, Dean

Re: exception causes streaming to hang forever

2013-05-24 Thread Hiller, Dean
;What kind of error does the other end of streaming(/10.10.42.36) say? > >On Wed, May 22, 2013 at 5:19 PM, Hiller, Dean >wrote: >> We had 3 nodes roll on good and the next 2, we see a remote node with >>this exception every time we start over and bootstrap the node >> >

exception causes streaming to hang forever

2013-05-22 Thread Hiller, Dean
We had 3 nodes roll on good and the next 2, we see a remote node with this exception every time we start over and bootstrap the node ERROR [Streaming to /10.10.42.36:2] 2013-05-22 14:47:59,404 CassandraDaemon.java (line 132) Exception in thread Thread[Streaming to /10.10.42.36:2,5,main] java.la

Re: High performance disk io

2013-05-22 Thread Hiller, Dean
If you are only running repair on one node, should it not skip that node? So there should be no performance hit except when doing CL_ALL of course. We had to make a change to cassandra or slow nodes did impact us previously. Dean From: Wei Zhu mailto:wz1...@yahoo.com>> Reply-To: "user@cassand

Re: High performance disk io

2013-05-22 Thread Hiller, Dean
Well, if you just want to lower your I/O util %, you could always just add more nodes to the cluster ;). Dean From: Igor mailto:i...@4friends.od.ua>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday, May 22, 2013 8:06 AM

bootstrapping a new node...

2013-05-21 Thread Hiller, Dean
We are using 1.2.2 cassandra and have rolled on 3 additionals nodes to our 6 node cluster(totalling 9 so far). We are trying to roll on node 10 but during the streaming a compaction kicked off which seemed very odd to us. "nodetool netstats" still reported tons of files that were not transferr

Re: (better info)any way to get the #writes/second, reads per second

2013-05-14 Thread Hiller, Dean
data distribution ? Did it settle down ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 14/05/2013, at 5:06 AM, "Hiller, Dean" mailto:dean.hil...@nrel.gov>> wrote: Ah, okay iostat -x NEEDS a num

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-14 Thread Hiller, Dean
We had to roll out a fix in cassandra as a slow node was slowing down our clients of cassandra in 1.2.2 for some reason. Every time we had a slow node, we found out fast as performance degraded. We tested this in QA and had the same issue. This means a repair made that node slow which made ou

Re: any way to get the #writes/second, reads per second

2013-05-14 Thread Hiller, Dean
t/blog/en/monitoring-cassandra-relevant-data-should-be-watched-and-how-send-it-graphite 2013/5/13 Hiller, Dean mailto:dean.hil...@nrel.gov>> We running a pretty consistent load on our cluster and added a new node to a 6 node cluster Friday(QA worked great, but production not so much). O

(better info)any way to get the #writes/second, reads per second

2013-05-13 Thread Hiller, Dean
sh -g datanodes nodetool compactionstats" Any reason why cassandra might be reading a lot from the data disks(not the commit log disk) more than usual? Thanks, Dean On 5/13/13 10:46 AM, "Hiller, Dean" wrote: >We running a pretty consistent load on our cluster and added a new no

any way to get the #writes/second, reads per second

2013-05-13 Thread Hiller, Dean
We running a pretty consistent load on our cluster and added a new node to a 6 node cluster Friday(QA worked great, but production not so much). One mistake that was made was starting up the new node, then disabling the firewall :( which allowed nodes to discover it BEFORE the node bootstrapped

Re: Replica info

2013-05-08 Thread Hiller, Dean
nodetool describering {keyspace} From: Kanwar Sangha mailto:kan...@mavenir.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday, May 8, 2013 3:00 PM To: "user@cassandra.apache.org" mail

Re: CQL3 Data Model Question

2013-05-07 Thread Hiller, Dean
gt; >create 123_table organic_events ( > hour timestamp, > event_id UUID, > app_id INT, > event_time TIMESTAMP, > user_id INT, > Š. > PRIMARY KEY (hour, event_time, event_id) >) WITH CLUSTERING ORDER BY (event_time desc); > > > >Is this what others are

Re: CQL3 Data Model Question

2013-05-07 Thread Hiller, Dean
We use PlayOrm to do 60,000 different streams which are all time series and use the virtual column families of PlayOrm so they are all in one column family. We then partition by time as well. I don't believe that we really have any hotspots from what I can tell. Dean From: Keith Wright mailt

index_interval

2013-05-06 Thread Hiller, Dean
I heard a rumor that index_interval is going away? What is the replacement for this? (we have been having to play with this setting a lot lately as too big and it gets slow yet too small and cassandra uses way too much RAM…we are still trying to find the right balance with this setting). Than

Re: multitenant support with key spaces

2013-05-06 Thread Hiller, Dean
Another option may be virtual column families with PlayOrm. We currently do around 60,000 column families to store data from 60,000 different sensors that keep feeding us information. Dean On 5/6/13 11:18 AM, "Robert Coli" wrote: >On Sun, May 5, 2013 at 11:37 PM, Darren Smythe >wrote: >> How

Re: hector or astyanax

2013-05-06 Thread Hiller, Dean
n from making general claims without actual benchmarks to back them up. I do completely agree that Async interfaces have their place and have certain advantages over multi-threading models, but it's just another tool to be used when appropriate. Just my .02. :) On Mon, May 6, 2013 at 5:08 AM, H

  1   2   3   4   5   >