Never running repair: No need vs consequences in our usage pattern

2014-11-26 Thread Wayne Schroeder
is fine) and deleted data re-appearing (and we never delete, we just always use TTLs). Perhaps there is some other reason to run repair that we are not aware of? Wayne

Re: Lightweight transaction (paxos) vs double check.

2014-08-13 Thread Wayne Schroeder
in that regard? Wayne On Aug 13, 2014, at 1:10 PM, Robert Coli mailto:rc...@eventbrite.com>> wrote: On Wed, Aug 13, 2014 at 9:16 AM, Wayne Schroeder mailto:wschroe...@pinsightmedia.com>> wrote: Are there hidden costs to LWT (paxos) that are not represented in the total time a

Lightweight transaction (paxos) vs double check.

2014-08-13 Thread Wayne Schroeder
reconciliation system. Are there hidden costs to LWT (paxos) that are not represented in the total time and number of operations? For example, are there some under-the-hood locks that could cause contention issues when processing significant quantities of LWT under load? Wayne

Update SSTable fragmentation

2014-04-09 Thread Wayne Schroeder
ating one column, will cassandra end flushing that entire row into one SSTable on disk and then end up up finding a non fragmented entire row quickly on reads instead of potential reconstruction across multiple SSTables? Obviously this has implications for space as a trade off. Wayne

Re: Row cache for writes

2014-03-31 Thread Wayne Schroeder
Perhaps I should clarify my question. Is this possible / how might I accomplish this with cassandra? Wayne On Mar 31, 2014, at 12:58 PM, Robert Coli mailto:rc...@eventbrite.com>> wrote: On Mon, Mar 31, 2014 at 9:37 AM, Wayne Schroeder mailto:wschroe...@pinsightmedia.com>> wrot

Row cache for writes

2014-03-31 Thread Wayne Schroeder
Has anyone done anything similar to this that could provide direction? Wayne

Re: Authoritative failed write: Using paxos to "cancel" failed quorum writes

2014-03-11 Thread Wayne Schroeder
I think it will work just fine. I was just asking for opinions on if there was some reason it would not work that I was not thinking of. On Mar 10, 2014, at 4:37 PM, Tupshin Harper mailto:tups...@tupshin.com>> wrote: Oh sorry, I misunderstood. But now I'm confused about how what you are tryi

Re: Authoritative failed write: Using paxos to "cancel" failed quorum writes

2014-03-10 Thread Wayne Schroeder
some other unrealized consequence. Additionally, it sounds like a potential nice CQL level feature -- that a language keyword could be added to indicate that a LWT should be done to ensure that the quorum update is an all or nothing update at the expense of using LWT. Wayne On Mar 10, 2014, at

Authoritative failed write: Using paxos to "cancel" failed quorum writes

2014-03-10 Thread Wayne Schroeder
ATIVE Thoughts? Wayne

Re: Caching prepared queries and different consistency levels

2014-02-28 Thread Wayne Schroeder
vely change the default of ONE that I am expecting. This is obviously specific to my application, but hopefully it helps anyone who has followed that pattern as well. Wayne On Feb 28, 2014, at 12:18 PM, Wayne Schroeder wrote: > After upgrading to the 2.0 driver branch, I received a lot

Caching prepared queries and different consistency levels

2014-02-28 Thread Wayne Schroeder
PreparedStatement? What I am hoping is the case is that the PreparedStatement's consistency level is just used to initialize the BoundStatement and that the BoundStatement's consistency level is then used when executing the query. Wayne

Default serial consistency level

2014-02-25 Thread Wayne Schroeder
on is that the default is SERIAL and I do not need to set anything as this is what I want. Wayne

Re: Read Latency Degradation

2010-12-18 Thread Wayne
Rereading through everything again I am starting to wonder if the page cache is being affected by compaction. We have been heavily loading data for weeks and compaction is basically running non-stop. The manual compaction should be done some time tomorrow, so when totally caught up I will try again

Re: Read Latency Degradation

2010-12-18 Thread Wayne
You are absolutely back to my main concern. Initially we were consistently seeing < 10ms read latency and now we see 25ms (30GB sstable file), 50ms (100GB sstable file) and 65ms (330GB table file) read times for a single read with nothing else going on in the cluster. Concurrency is not our problem

Re: Read Latency Degradation

2010-12-18 Thread Wayne
We are using XFS for the data volume. We are load testing now, and compaction is way behind but weekly manual compaction should help catch things up. Smaller nodes just seem to fit the Cassandra architecture a lot better. We can not use cloud instances, so the cost for us to go to <500gb nodes is

Re: Read Latency Degradation

2010-12-17 Thread Wayne
one CF, 98gig data, 587k filter, 18meg index for the other. Thanks. On Fri, Dec 17, 2010 at 10:58 AM, Edward Capriolo wrote: > On Fri, Dec 17, 2010 at 8:21 AM, Wayne wrote: > > We have been testing Cassandra for 6+ months and now have 10TB in 10 > nodes > > with rf=3. It

Re: Read Latency Degradation

2010-12-17 Thread Wayne
ion just does not add up. Thanks in advance for any advice/experience you can provide. On Fri, Dec 17, 2010 at 5:07 AM, Daniel Doubleday wrote: > > On Dec 16, 2010, at 11:35 PM, Wayne wrote: > > > I have read that read latency goes up with the total data size, but to >

Read Latency Degradation

2010-12-16 Thread Wayne
We are running .6.8 and are reaching 1tb/node in a 10 node cluster rf=3. Our reads times seem to be getting worse as we load data into the cluster, and I am worried there is a scale problem in terms of large column families. All benchmarks/times come from cfstats reporting so no client code or time

multiple datacenter with low replication factor - idea for greater flexibility

2010-11-10 Thread Wayne Lewis
trategy interface to pass the endpoints to getQuorumResponseHandler and getWriteResponseHandler, but otherwise changes are contained in the plugin. There is more analysis I can share if anyone is interested. But at this point I'd like to get feedback. Thanks, Wayne Lewis

Re: Backup Strategy

2010-11-09 Thread Wayne
, Nov 9, 2010 at 12:04 PM, Edward Capriolo wrote: > On Tue, Nov 9, 2010 at 8:15 AM, Wayne wrote: > > I got some very good advice on manual compaction so I thought I would > throw > > out another question on raid/backup strategies for production clusters. > > > > We are

Backup Strategy

2010-11-09 Thread Wayne
(which will be in minutes). Thanks. Wayne

Manual Compaction in Production

2010-11-08 Thread Wayne
ffect on a node and its ability to function under heavy load our assumption is that staggered over the weekend for example (our low usage time) would be best. Any recommendations? Thanks Wayne

10G Ethernet / Infiniband

2010-10-26 Thread Wayne
Is anyone out there using 10G Ethernet or Infiniband with Cassandra in a cluster where there is a noticeable increase in performance or reduced latency to warrant the cost of the faster network connections? Any experience or even rationale would help. We are considering Infiniband but are on the f

Re: Read Latency

2010-10-20 Thread Wayne
get_slice with start="", finish="" and count = 100,001 > - pop the last column and store it's name > - get_slice with start as the last column name, finish="" and count = > 100,001 > > repeat. > > A > > On 20 Oct, 2010,at 03:08 PM, Wayne

Re: Read Latency

2010-10-19 Thread Wayne
for csc in buffer > ] > print "Done2 in %s" % (time.time() -start) > > {977} > python decode_test.py 10 > Done in 0.75460100174 > Done2 in 0.314303874969 > > {978} > python decode_test.py 60 > Done in 13.2945489883 > Done2 in 7.328611850

Re: Read Latency

2010-10-19 Thread Wayne
> > Out of interest are you able to try the avro client? It's still > experimental (0.7 only) but may give you something to compare it against. > > Aaron > On 20 Oct, 2010,at 07:23 AM, Wayne wrote: > > It is an entire row which is 600,000 cols. We pass a limit of 10mill

Re: Read Latency

2010-10-19 Thread Wayne
should always be what takes the longest. Everything else is passing through memory. On Tue, Oct 19, 2010 at 2:06 PM, aaron morton wrote: > Wayne, > I'm calling cassandra from Python and have not seen too many 3 second > reads. > > Your last email with log messages in it looks

Re: Read Latency

2010-10-19 Thread Wayne
> On Tue, Oct 19, 2010 at 8:21 AM, Wayne wrote: > > The changes seems to do the trick. We are down to about 1/2 of the > original > > quorum read performance. I did not see any more errors. > > > > More than 3 seconds on the client side is still not acceptable to us.

Re: Read Latency

2010-10-19 Thread Wayne
The changes seems to do the trick. We are down to about 1/2 of the original quorum read performance. I did not see any more errors. More than 3 seconds on the client side is still not acceptable to us. We need the data in Python, but would we be better off going through Java or something else to i

Re: Read Latency

2010-10-18 Thread Wayne
48,542 ReadResponseResolver.java (line 116) digests verified DEBUG [pool-1-thread-33] 2010-10-18 19:22:48,542 ReadResponseResolver.java (line 133) resolve: 162 ms. DEBUG [pool-1-thread-33] 2010-10-18 19:22:48,542 StorageProxy.java (line 494) quorumResponseHandler: 2688 ms. On Sat, Oct 16, 2

Re: Read Latency

2010-10-16 Thread Wayne
, Jonathan Ellis wrote: > Stack trace from cassandra log? > > On Sat, Oct 16, 2010 at 6:50 AM, Wayne wrote: > > While doing a read I get a TApplicationException: Internal Error > processing > > get_slice. > > > > On Fri, Oct 15, 2010 at 5:49 PM, Jonathan Elli

Re: Read Latency

2010-10-16 Thread Wayne
While doing a read I get a TApplicationException: Internal Error processing get_slice. On Fri, Oct 15, 2010 at 5:49 PM, Jonathan Ellis wrote: > On Fri, Oct 15, 2010 at 2:21 PM, Wayne wrote: > > The optimization definitely shaved off some time. Now it is running about > 3x > >

Re: Read Latency

2010-10-15 Thread Wayne
-p0 < resolve.txt against 0.6.6, should > also work against 0.6.5) and let us know the results, that would be > great. > > It also adds an optimization to avoid cloning the result in the common > case where digests match (which is the case in your logs, or you would > see "dig

0.6.6 Problems Starting

2010-10-14 Thread Wayne
We have upgraded from 0.6.5 to 0.6.6 and our nodes will not come up. See error below. Did something change that we need to change in the config files? Thanks. INFO 22:13:37,761 JNA not found. Native methods will be disabled. INFO 22:13:38,083 DiskAccessMode isstandard, indexAccessMode is mmap E

Read Latency

2010-10-14 Thread Wayne
Can someone help to determine the anatomy of a quorum read? We are trying to understand why CFSTATS reports one time and the client actual gets data back almost 4x slower. Below are the debug logs from a read that all 3 nodes reported < 2.5secs response time in cfstats but the client did not get da

Read Latency

2010-10-06 Thread Wayne
I have been seeing some strange trends in read latency that I wanted to throw out there to find some explanations. We are running .6.5 in a 10 node cluster rf=3. We find that the read latency reported by the cfstats is always about 1/4 of the actual time it takes to get the data back to the python

Re: Cassandra performance

2010-09-15 Thread Wayne
If MySQL is faster then use it. I struggled to do side by side comparisons with Mysql for months until finally realizing they are too different to do side by side comparisons. Mysql is always faster out of the gate when you come at the problem thinking in terms of relational databases. Add in repli

Re: buggy secondary indexes?

2010-09-13 Thread Wayne
This is a use case we have been struggling with for a long time. How do we maintain indexes? It is easy to write out a secondary index even manually, but how does one maintain and index when a value changes. Our only scalable answer has been background processes that churn through everything to ver

Re: Node OOM Problems

2010-08-23 Thread Wayne
Thanks for the confirmation this is NOT the way to go. I will stick with 4 disks raid 0 with a single data directory. On Mon, Aug 23, 2010 at 9:24 PM, Rob Coli wrote: > On 8/22/10 12:00 AM, Wayne wrote: > >> Due to compaction being so expensive in terms of disk resources, does it

Re: Node OOM Problems

2010-08-22 Thread Wayne
ank everyone for their help. On Sun, Aug 22, 2010 at 10:37 PM, Benjamin Black wrote: > Wayne, > > Bulk loading this much data is a very different prospect from needing > to sustain that rate of updates indefinitely. As was suggested > earlier, you likely need to tune things diff

Re: Node OOM Problems

2010-08-22 Thread Wayne
Has anyone loaded 2+ terabytes of real data in one stretch into a cluster without bulk loading and without any problems? How long did it take? What kind of nodes were used? How many writes/sec/node can be sustained for 24+ hours? On Sun, Aug 22, 2010 at 8:22 PM, Peter Schuller wrote: > I only

Re: Node OOM Problems

2010-08-22 Thread Wayne
. > > > b > > On Sat, Aug 21, 2010 at 11:27 PM, Wayne wrote: > > Thank you for the advice, I will try these settings. I am running > defaults > > right now. The disk subsystem is one SATA disk for commitlog and 4 SATA > > disks in raid 0 for the data. > >

Re: Node OOM Problems

2010-08-22 Thread Wayne
spindles? On Sun, Aug 22, 2010 at 8:27 AM, Wayne wrote: > Thank you for the advice, I will try these settings. I am running defaults > right now. The disk subsystem is one SATA disk for commitlog and 4 SATA > disks in raid 0 for the data. > > From your email you are implying this hardware

Re: Node OOM Problems

2010-08-21 Thread Wayne
tivity demands a lot faster > storage (iops and bandwidth). > > > b > On Sat, Aug 21, 2010 at 2:18 AM, Wayne wrote: > > I am already running with those options. I thought maybe that is why they > > never get completed as they keep pushed pushed down in priority? I am > >

Re: Node OOM Problems

2010-08-21 Thread Wayne
Sat, Aug 21, 2010 at 3:17 AM, Wayne wrote: > > I could tell from munin that the disk utilization was getting crazy high, > > but the strange thing is that it seemed to "stall". The utilization went > way > > down and everything seemed to flatten out. Requests pil

Re: Node OOM Problems

2010-08-20 Thread Wayne
. On Fri, Aug 20, 2010 at 7:51 PM, Edward Capriolo wrote: > On Fri, Aug 20, 2010 at 1:17 PM, Wayne wrote: > > I turned off the creation of the secondary indexes which had the large > rows > > and all seemed good. Thank you for the help. I was getting > > 60k+/writes/seco

Re: Node OOM Problems

2010-08-20 Thread Wayne
kicking in cause this? I have added the 3 JVM settings to make compaction a lower priority. Did this help cause this to happen by slowing down and building up compaction on a heavily loaded system? Thanks in advance for any help someone can provide. On Fri, Aug 20, 2010 at 8:34 AM, Wayne wrote

Re: Node OOM Problems

2010-08-19 Thread Wayne
The NullPointerException does not crash the node. It only makes it flap/go down a for short period and then it comes back up. I do not see anything abnormal in the system log, only that single error in the cassandra.log. On Thu, Aug 19, 2010 at 11:42 PM, Peter Schuller < peter.schul...@infidyne.c

Re: Node OOM Problems

2010-08-19 Thread Wayne
What is my "live set"? Is the system CPU bound given the few statements below? This is from running 4 concurrent processes against the node...do I need to throttle back the concurrent read/writers? I do all reads/writes as Quorum. (Replication factor of 3). The memtable threshold is the default o

Re: Key Caching

2010-07-26 Thread Wayne
If the cache is stored in the heap, how big can the heap be made realistically on a 24gb ram machine? I am a java newbie but I have read concerns with going over 8gb for the heap as the GC can be too painful/take too long. I already have seen timeout issues (node is dead errors) under load during G

Key Caching

2010-07-26 Thread Wayne
how to design our cluster for this. We are currently testing with 4 nodes with replication factor of 3 (24gb ram 8 core), and we plan to expand the node count as required to fit most/all keys into memory. Thanks in advance for any help you can provide. Wayne

Read Latency

2010-05-11 Thread Wayne
I am evaluating Cassandra, and Read latency is the biggest concern in terms of performance. As I test various scenarios and configurations I am getting surprising results. I have a 2 node cluster with both nodes connected to direct attached storage. The read latency pulling data off the raid 10 sto