Re: Cassandra 3.10: ClassCastException in ThreadAwareSecurityManager

2017-03-31 Thread Edward Capriolo
I created https://issues.apache.org/jira/browse/CASSANDRA-13396 for you https://issues.apache.org/jira/browse/CASSANDRA-13396 /** * The purpose of this class is */ this purpose of this class is ...what ? this class is who? sicka sicka slim shady. On Thu, Mar 30, 2017 at 1:48 PM, Anton PASSIOUK <

Re: Assertions being hit on Cassandra 3.5 cluster (UnfilteredRowIterators.concat)

2017-03-22 Thread Edward Capriolo
On Wed, Mar 22, 2017 at 4:34 PM, Daniel Miranda wrote: > I found out the problem is conditioned to having the row cache enabled. > Whenever a query would return an empty result set in a particular table, it > would fail instead with the exception being thrown in all all nodes. > > Disabling the r

Ye old singleton debate

2017-03-15 Thread Edward Capriolo
This question came up today: OK, say you mock, how do you construct a working multi-process representation of how C* actually works from within a unit test without running the code that actually constructs the cluster? 1) Don't do that (construct a multinode cluster in a test) just mock the crap

Re: scylladb

2017-03-12 Thread Edward Capriolo
On Sun, Mar 12, 2017 at 3:45 PM, Dor Laor wrote: > On Sun, Mar 12, 2017 at 12:11 PM, Edward Capriolo > wrote: > >> The simple claim that "Scylla IS a drop in replacement for C*" shows >> that they clearly don't know as much as they think they do. >

Re: scylladb

2017-03-12 Thread Edward Capriolo
The simple claim that "Scylla IS a drop in replacement for C*" shows that they clearly don't know as much as they think they do. Even if it did supposedly "support everything" it would not actually work like that. For example, some things in Cassandra work "the way they work" . They are not specif

Re: scylladb

2017-03-12 Thread Edward Capriolo
On Sun, Mar 12, 2017 at 11:40 AM, Edward Capriolo wrote: > > > On Sun, Mar 12, 2017 at 1:38 AM, benjamin roth wrote: > >> There is no reason to be angry. This is progress. This is the circle of >> live. >> >> It happens anywhere at any time. >>

Re: scylladb

2017-03-12 Thread Edward Capriolo
On Sun, Mar 12, 2017 at 1:38 AM, benjamin roth wrote: > There is no reason to be angry. This is progress. This is the circle of > live. > > It happens anywhere at any time. > > Am 12.03.2017 07:34 schrieb "Dor Laor" : > >> On Sat, Mar 11, 2017 at 10:02 PM, Jeff Jirsa wrote: >> >>> >>> >>> On 201

Re: scylladb

2017-03-11 Thread Edward Capriolo
On Sat, Mar 11, 2017 at 9:41 PM, daemeon reiydelle wrote: > Recall that garbage collection on a busy node can occur minutes or seconds > apart. Note that stop the world GC also happens as frequently as every > couple of minutes on every node. Remove that and do the simple arithmetic. > > > sent f

Re: scylladb

2017-03-11 Thread Edward Capriolo
On Sat, Mar 11, 2017 at 2:08 PM, Bhuvan Rawal wrote: > "Lastly, why don't you test Scylla yourself? It's pretty easy to set up, > there's nothing to tune." > - The details are indeed compelling to have a go ahead and test it for > specific use case. > > If it works out good it can lead to good

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Edward Capriolo
On Saturday, March 4, 2017, Thakrar, Jayesh wrote: > LCS does not rule out frequent updates - it just says that there will be > more frequent compaction, which can potentially increase compaction > activity (which again can be throttled as needed). > > But STCS will guarantee OOM when you have la

Re: question of keyspace that just disappeared

2017-03-03 Thread Edward Capriolo
On Fri, Mar 3, 2017 at 7:56 AM, Romain Hardouin wrote: > I suspect a lack of 3.x reliability. Cassandra could had gave up with > dropped messages but not with a "drop keyspace". I mean I already saw some > spark jobs with too much executors that produce a high load average on a > DC. I saw a C* n

Re: Is periodic manual repair necessary?

2017-02-27 Thread Edward Capriolo
There are 4 anti entropy systems in cassandra. Hinted handoff Read repair Commit logs Repair commamd All are basically best effort. Commit logs get corrupt and only flush periodically. Bits rot on disk and while crossing networks network Read repair is async and only happens randomly Hinted h

Re: Pluggable throttling of read and write queries

2017-02-22 Thread Edward Capriolo
#x27;s better to split a large cluster into smallers >> except if you also manage client layer that query cass and you can put some >> backpressure or rate limit in it. >> > We have an internal storage API layer that some of the clients use, but > there are many customers wh

Re: How does cassandra achieve Linearizability?

2017-02-22 Thread Edward Capriolo
e only use Paxos for CAS I > think? So in a batch by definition all but one will fail the CAS. This is > something where a distinguished coordinator could help by failing the rest > of the contending requests more inexpensively than it currently does. > > > Ariel > > On Thu

Re: Trouble implementing CAS operation with LWT query

2017-02-22 Thread Edward Capriolo
On Wed, Feb 22, 2017 at 8:42 AM, 안정아 wrote: > Hi, all > > > > I'm trying to implement a typical CAS operation with LWT query(conditional > update). > > But I'm having trouble keeping integrity of the result when > WriteTimeoutException occurs. > > according to http://www.datastax.com/dev/blog/cas

Pluggable throttling of read and write queries

2017-02-20 Thread Edward Capriolo
Older versions had a request scheduler api. On Monday, February 20, 2017, Ben Slater > wrote: > We’ve actually had several customers where we’ve done the opposite - split > large clusters apart to separate uses cases. We found that this allowed us > to better align hardware with use case requirem

Re: cassandra user request log

2017-02-20 Thread Edward Capriolo
Not directly. Consider proxing request through an application server and log at that level. On Friday, February 10, 2017, Benjamin Roth wrote: > If you want to audit write operations only, you could maybe use CDC, this > is a quite new feature in 3.x (I think it was introduced in 3.9 or 3.10) >

Re: Count(*) is not working

2017-02-20 Thread Edward Capriolo
Seems worth it to file a bug since some here are under the impression it almost always works and others are under the impression it almost never works. On Friday, February 17, 2017, kurt greaves wrote: > really... well that's good to know. it still almost never works though. i > guess every time

Re: High disk io read load

2017-02-19 Thread Edward Capriolo
On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth wrote: > We are talking about a read IO increase of over 2000% with 512 tokens > compared to 256 tokens. 100% increase would be linear which would be > perfect. 200% would even okay, taking the RAM/Load ratio for caching into > account. But > 20x the

Re: How does cassandra achieve Linearizability?

2017-02-16 Thread Edward Capriolo
On Thu, Feb 16, 2017 at 4:33 PM, Ariel Weisberg wrote: > Hi, > > Classic Paxos doesn't have a leader. There are variants on the original > Lamport approach that will elect a leader (or some other variation like > Mencius) to improve throughput, latency, and performance under contention. > Cassand

Re: High disk io read load

2017-02-16 Thread Edward Capriolo
On Thu, Feb 16, 2017 at 12:38 AM, Benjamin Roth wrote: > It doesn't really look like that: > https://cl.ly/2c3Z1u2k0u2I > > Thats the ReadLatency.count metric aggregated by host which represents the > actual read operations, correct? > > 2017-02-15 23:01 GMT+01:00 Edwa

Re: High disk io read load

2017-02-15 Thread Edward Capriolo
I think it has more than double the load. It is double the data. More read repair chances. More load can swing it's way during node failures etc. On Wednesday, February 15, 2017, Benjamin Roth wrote: > Hi there, > > Following situation in cluster with 10 nodes: > Node A's disk read IO is ~20 tim

Re: Determining if data will be created on Cassandra Write Exceptions

2017-02-14 Thread Edward Capriolo
On Tue, Feb 14, 2017 at 2:30 PM, rouble wrote: > Cassandra Gurus, > > I have a question open on stackoverlow on how to determine if data is > actually created when a Write exception is thrown: http://stackoverflow.c > om/questions/42231140/determining-if-data-will-be-created- > on-cassandra-write

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-02-13 Thread Edward Capriolo
;> what is this shill fest >> >> >> On 12 Feb. 2017 8:24 am, "Kant Kodali" > > wrote: >> >> Saw this one today... >> >> https://news.ycombinator.com/item?id=13624062 >> >> On Tue, Jan 3, 2017 at 6:27 AM, Eric Evans > > wrote

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
t; > ever create is low. You will regret BOP until the end of time. > > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo > <mailto:edlinuxg...@gmail.com>> wrote: > > > > Probably best to avoid bop even if you are aflready hashing keys > > yourself. Wh

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
On Sat, Feb 11, 2017 at 10:54 AM, Jonathan Haddad wrote: > The odds of only using a sha1 as your partition key for every table you > ever create is low. You will regret BOP until the end of time. > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo > wrote: > >> Probably best

Re: ByteOrdered partitioner when using sha-1 as partition key

2017-02-11 Thread Edward Capriolo
Probably best to avoid bop even if you are aflready hashing keys yourself. What do you do when checksuma collide? It is possible right? On Saturday, February 11, 2017, Micha wrote: > Hi, > > my table has a sha-1 sum as partition key. Would in this case the > ByteOrdered partitioner be a better c

Re: Composite partition key token

2017-02-09 Thread Edward Capriolo
On Thu, Feb 9, 2017 at 9:26 AM, Michael Burman wrote: > Hi, > > How about taking it from the BoundStatement directly? > > ByteBuffer routingKey = b.getRoutingKey(ProtocolVersion.NEWEST_SUPPORTED, > codecRegistry); > Token token = metadata.newToken(routingKey); > > In this case the b is the "Bound

Re: [RELEASE] Apache Cassandra 3.10 released

2017-02-03 Thread Edward Capriolo
On Fri, Feb 3, 2017 at 6:52 PM, Michael Shuler wrote: > The Cassandra team is pleased to announce the release of Apache > Cassandra version 3.10. > > Apache Cassandra is a fully distributed database. It is the right choice > when you need scalability and high availability without compromising > p

Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Edward Capriolo
On Tue, Jan 17, 2017 at 11:47 AM, Mike Torra wrote: > Thanks for the feedback everyone! Redis `zincryby` and `zrangebyscore` is > indeed what we use today. > > Caching the resulting 'sorted sets' in redis is exactly what I plan to do. > There will be tens of thousands of these sorted sets, each g

Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Edward Capriolo
On Fri, Jan 13, 2017 at 8:14 PM, Jonathan Haddad wrote: > I've thought about this for years and have never arrived on a particularly > great implementation. Your idea will be maybe OK if the sets are very > small and if the values don't change very often. But in a system where the > values of t

Re: implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Edward Capriolo
On Fri, Jan 13, 2017 at 5:14 PM, Mike Torra wrote: > We currently use redis to store sorted sets that we increment many, many > times more than we read. For example, only about 5% of these sets are ever > read. We are getting to the point where redis is becoming difficult to > scale (currently at

Re: Strange issue wherein cassandra not being started from cron

2017-01-10 Thread Edward Capriolo
On Tuesday, January 10, 2017, Jonathan Haddad wrote: > Last I checked, cron doesn't load the same, full environment you see when > you log in. Also, why put Cassandra on a cron? > On Mon, Jan 9, 2017 at 9:47 PM Bhuvan Rawal > wrote: > >> Hi Ajay, >> >> Have you had a look at cron logs? - mine is

Re: Help

2017-01-09 Thread Edward Capriolo
On Sun, Jan 8, 2017 at 11:30 PM, Anshu Vajpayee wrote: > Gossip shows - all nodes are up. > > But when we perform writes , coordinator stores the hints. It means - > coordinator was not able to deliver the writes to few nodes after meeting > consistency requirements. > > The nodes for which wr

Re: Logs appear to contradict themselves during bootstrap steps

2017-01-06 Thread Edward Capriolo
On Fri, Jan 6, 2017 at 6:45 PM, Sotirios Delimanolis wrote: > I forgot to check nodetool gossipinfo. Still, why does the first check > think that the address exists, but the second doesn't? > > > On Friday, January 6, 2017 1:11 PM, David Berry > wrote: > > > I’ve encountered this previously wher

Re: weird jvm metrics

2017-01-05 Thread Edward Capriolo
On Thu, Jan 5, 2017 at 1:53 PM, Alain Rastoul wrote: > On 01/04/2017 11:12 PM, Edward Capriolo wrote: > >> The metric-reporter is actually leveraged from another project. >> >> https://github.com/addthis/metrics-reporter-config >> >> Check the version of me

Re: weird jvm metrics

2017-01-04 Thread Edward Capriolo
The metric-reporter is actually leveraged from another project. https://github.com/addthis/metrics-reporter-config Check the version of metric-reporter (in cassandra/lib) and see if it has changed from your old version to your new version. On Wed, Jan 4, 2017 at 12:02 PM, Mike Torra wrote:

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-01-02 Thread Edward Capriolo
e. > open the JDK source code and read it. you will encounter some great ideas > and Algorithms. > > > > > > On Mon, Jan 2, 2017 at 1:04 PM, Edward Capriolo > wrote: > >> >> On Mon, Jan 2, 2017 at 3:51 PM, Benjamin Roth >> wrote: >> >>&g

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-01-02 Thread Edward Capriolo
On Mon, Jan 2, 2017 at 3:51 PM, Benjamin Roth wrote: > Does this discussion really make sense any more? To me it seems it turned > opinionated and religious. From my point of view anything that has to be > said was said. > > Am 02.01.2017 21:27 schrieb "Edward Capriolo"

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2017-01-02 Thread Edward Capriolo
On Mon, Jan 2, 2017 at 11:56 AM, Eric Evans wrote: > On Fri, Dec 23, 2016 at 9:15 PM, Edward Capriolo > wrote: > > "I don't really have any opinions on Oracle per say, but Cassandra is a > > Free Software project and I would prefer that we not depend on > > com

Re: Query

2016-12-29 Thread Edward Capriolo
You should start with understanding your needs. Once you understand your need you can pick the software that fits your need. Staring with a software stack is backwards. On Thu, Dec 29, 2016 at 11:34 PM, Ben Slater wrote: > I wasn’t familiar with Gizzard either so I thought I’d take a look. The >

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-26 Thread Edward Capriolo
lling to > pay even for distributed databases so I don't think anyone would pay for > programming language. In short, Let me end by saying Oracle just has lot of > self interest but I really hope that I am wrong since I am a big fan of JVM. > > > > > > On Fri, Dec 23,

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-23 Thread Edward Capriolo
On Fri, Dec 23, 2016 at 6:01 AM, Kant Kodali wrote: > Java 9 Module system looks really interesting. I would be very curious to > see how Cassandra would leverage that. > > On Thu, Dec 22, 2016 at 9:09 AM, Kant Kodali wrote: > >> I would agree with Eric with his following statement. In fact, I w

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-23 Thread Edward Capriolo
Anecdotal CAS works differently than the typical cassandra workload. If you run a stress instance 3 nodes one host, you find that you typically run into CPU issues, but if you are doing a CAS workload you see things timing out and before you hit 100% CPU. It is a strange beast. On Fri, Dec 23, 201

Re: Advice in upgrade plan from 1.2.18 to 2.2.8

2016-12-22 Thread Edward Capriolo
Also before you get started. Make sure: 1) no one attempts to change schema during the process 2) no one attempts to run a repair 3) no one attempts to join a node 4) no one attempts to remove/move nodes from the cluster Each of these things trigger repair sessions and stream data which do not wor

Re: Why does Cassandra recommends Oracle JVM instead of OpenJDK?

2016-12-21 Thread Edward Capriolo
On Wednesday, December 21, 2016, Kant Kodali wrote: > https://www.youtube.com/watch?v=9ei-rbULWoA > > On Wed, Dec 21, 2016 at 2:59 AM, Kant Kodali > wrote: > >> https://www.elastic.co/guide/en/elasticsearch/guide/current/ >> _java_virtual_machine.html >> >> On Wed, Dec 21, 2016 at 2:58 AM, Kant

Re: Benefit of LOCAL_SERIAL consistency

2016-12-08 Thread Edward Capriolo
t's exactly what I mean. >> (Your comment is very helpful to support my opinion.) >> >> As you said, SERIAL with multi-DCs incurs latency increase, >> but it's a trade-off between latency and high availability bacause one >> DC can be down from a disaster. >

Re: Batch size warnings

2016-12-07 Thread Edward Capriolo
I have been circling around a thought process over batches. Now that Cassandra has aggregating functions, it might be possible write a type of record that has an END_OF_BATCH type marker and the data can be suppressed from view until it was all there. IE you write something like a checksum record

Re: Benefit of LOCAL_SERIAL consistency

2016-12-07 Thread Edward Capriolo
On Wed, Dec 7, 2016 at 8:25 AM, DuyHai Doan wrote: > The reason you don't want to use SERIAL in multi-DC clusters is the > prohibitive cost of lightweight transaction (in term of latency), > especially if your data centers are separated by continents. A ping from > London to New York takes 52ms j

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
On Sat, Dec 3, 2016 at 11:01 AM, Edward Capriolo wrote: > > > On Saturday, December 3, 2016, Edward Capriolo > wrote: > >> >> >> On Saturday, December 3, 2016, Jonathan Haddad wrote: >> >>> That isn't what the original thread is about. The

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
On Saturday, December 3, 2016, Edward Capriolo wrote: > > > On Saturday, December 3, 2016, Jonathan Haddad > wrote: > >> That isn't what the original thread is about. The thread is about the >> timestamp portion of the UUID being different. >> >> Havi

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
virtually every time. > On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo > wrote: > >> >> >> On Friday, December 2, 2016, Jonathan Haddad > > wrote: >> >>> This isn't about using the same UUID though. It's about the timestamp >>> bits

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo > wrote: > >> >> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne > > wrote: >> >>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo >> > wrote: >>> >>>> >>>>

Re: Why does `now()` produce different times within the same query?

2016-12-02 Thread Edward Capriolo
On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne wrote: > On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo > wrote: > >> >> I am not sure you saw my reply on thread but I believe everyone's needs >> can be met I will copy that here: >> > > I

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Edward Capriolo
On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne wrote: > On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo > wrote: > >> >> I am not sure you saw my reply on thread but I believe everyone's needs >> can be met I will copy that here: >> > > I

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Edward Capriolo
On Thu, Dec 1, 2016 at 4:06 AM, Sylvain Lebresne wrote: > One can of course always open a JIRA, but I'm going to strongly disagree > with a > change here (outside of a documentation one that is). > > The now() function is a timeuuid generator, and it thus generates a unique > timeuuid on every ca

Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Edward Capriolo
On Wed, Nov 30, 2016 at 10:53 PM, Cody Yancey wrote: > This is not a bug, and in fact changing it would be a serious bug. > > False. Absolutely no consumer would be broken by a change to guarantee an > identical time component that isn't broken already, for the simple reason > your code alrea

Re: Schema Changes

2016-11-15 Thread Edward Capriolo
You can start here: https://issues.apache.org/jira/browse/CASSANDRA-10699 And here: http://stackoverflow.com/questions/20293897/cassandra-resolution-of-concurrent-schema-changes In a nutshell, schema changes works best when issued serially, when all nodes are up, and reachable. When these 3 con

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Edward Capriolo
Here is a solution that I have leverage. Ignore the count of the value and use a multi-part column name as it's value. For example: create column family stuff ( rowkey string, column string, value string. counter_to_ignore long, primary key( rowkey, column, value)); On Tue, Nov 1, 2016 at 9:29

Re: how to get the size of the particular partition key belonging to an sstable ??

2016-10-28 Thread Edward Capriolo
There are actually multiple tickets for different size functions. Examples include computing size of collections, number of rows, and physical sizes server side. I also have a patch to make the warn and info settable at runtime. https://issues.apache.org/jira/browse/CASSANDRA-12661?filter=-1 It

Re: Tools to manage repairs

2016-10-28 Thread Edward Capriolo
On Fri, Oct 28, 2016 at 11:21 AM, Vincent Rischmann wrote: > Doesn't paging help with this ? Also if we select a range via the cluster > key we're never really selecting the full partition. Or is that wrong ? > > > On Fri, Oct 28, 2016, at 05:00 PM, Edward Capriolo wrot

Re: Tools to manage repairs

2016-10-28 Thread Edward Capriolo
Big partitions are an anti-pattern here is why: First Cassandra is not an analytic datastore. Sure it has some UDFs and aggregate UDFs, but the true purpose of the data store is to satisfy point reads. Operations have strict timeouts: # How long the coordinator should wait for read operations to

Re: Cassandra failure during read query at consistency QUORUM (2 responses were required but only 0 replica responded, 2 failed)

2016-10-28 Thread Edward Capriolo
This looks like another case of an assert bubbling through try catch that don't catch assert On Fri, Oct 28, 2016 at 6:30 AM, Denis Mikhaylov wrote: > Hi! > > We’re running Cassandra 3.9 > > On the application side I see failed reads with this exception > com.datastax.driver.core.exceptions.Read

Re: How does the "batch" commit log sync works

2016-10-27 Thread Edward Capriolo
I mentioned during my Cassandra.yaml presentation at the summit that I never saw anyone use these settings. Things off by default are typically not highly not covered well by tests. It sounds like it is not working. Quick suggestion: go back in time maybe to a version like 1.2.X or 0.7 and see if i

Re: Handle Leap Seconds with Cassandra

2016-10-27 Thread Edward Capriolo
Following https://issues.apache.org/jira/browse/CASSANDRA-9131. It is very interesting to track how the timestamp has moved from the user, to the server, then back to the user quasi the driver. Next we will be accounting for the earths slowing rotation as the ice caps melt :) https://www.uwgb.edu

Re: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

2016-10-26 Thread Edward Capriolo
I would suggest you look some existing work http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html and attempt to re-create those scenarios and methodologies for failing nodes and seeing the performance impact. This would yield faster and more easily verifiable results tha

Re: Keyspace/CF creation Timeouts

2016-10-25 Thread Edward Capriolo
I do not believe the ConsistencyLevel matters for schema changes. In recent versions request_timeout_in_ms has been replaced by N variables which allow different timeout values for different types of operations. You seem to have both a lot of keyspaces and column families. It seems likely that you

Re: Thousands of SSTables generated in only one node

2016-10-25 Thread Edward Capriolo
I have not read the entire thread so sorry if this is already mentioned. You should review your logs, a potential problem could be a corrupted sstable. In a situation like this you will notice that the system is repeatedly trying to compact a given sstable. The compaction fails and based on the he

Re: Question on write failures logs show Uncaught exception on thread Thread[MutationStage-1,5,main]

2016-10-24 Thread Edward Capriolo
The driver will enforce a max batch size of 65k. This is an issue in versions of cassandra like 2.1.X. There are control variables for the logged and unlogged batch size. You may also have to tweak your commitlog size as well. I demonstrate this here: https://github.com/edwardcapriolo/ec/blob/mast

Re: Inconsistencies in materialized views

2016-10-17 Thread Edward Capriolo
https://issues.apache.org/jira/browse/CASSANDRA-11198 Which has problems "maybe" fixed by: https://issues.apache.org/jira/browse/CASSANDRA-11475 Which has it's own set of problems. One of these patches was merged into 3.7 which tells you are running a version 3.6 with known bugs. Also as the fe

Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Edward Capriolo
The "2 billion column limit" press clipping "puffery". This statement seemingly became popular because highly traffic traffic-ed story, in which a tech reporter embellished on a statement to make a splashy article. The effect is something like this: http://www.healthnewsreview.org/2012/08/iced-tea

Re: Question on Read Repair

2016-10-11 Thread Edward Capriolo
This is theory but not the all practice. The failure detector heartbeats is a process happening outside the read. Take for example a cluster with Replication Factor 3. At time('1) the failure detector might read three nodes as UP. A request "soon after '1" issued at time(`2) might start a read pro

Re: Running Cassandra in Integration Tests

2016-10-06 Thread Edward Capriolo
Checkout https://github.com/edwardcapriolo/farsandra. It falls under the realm of almost 100% pure java (besides the fact it uses some shell to launch Cassandra). On Thu, Oct 6, 2016 at 7:08 PM, Ali Akhtar wrote: > Is it possible to create an isolated cassandra instance which is run > during int

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
Since the feature is off by default. The coverage might could be only as deep as the specific tests that test it. On Mon, Oct 3, 2016 at 4:54 PM, Jeff Jirsa wrote: > Seems like it’s probably worth opening a jira issue to track it (either to > confirm it’s a bug, or to be able to better explain i

Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Edward Capriolo
I undertook a similar effort a while ago. https://issues.apache.org/jira/browse/CASSANDRA-7014 Other than the fact that it was closed with no comments, I can tell you that other efforts I had to embed things in Cassandra did not go swimmingly. Although at the time ideas were rejected like groovy

Re: Row cache not working

2016-10-03 Thread Edward Capriolo
I was thinking about this issue. I was wondering on the dev side if it would make sense to make a utility for the unit tests that could enable tracing and then assert that a number of steps in the trace happened. Something like: setup() runQuery("SELECT * FROM X") Assertion.assertTrace("Preparing

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
there suck, starting with >> IBM at the top. >> >> Saying the docs suck isn't an indictment of anyone, it's just the reality >> of writing good documentation. >> >> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad >> wrote: >> >>> Nobody is cla

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
that it needs to be renamed. > > Any relational db could (and I'm sure one does!) allow for sparse fields > as well. MySQL can be backed by rocksdb now, does that make it not a row > store? > > You're arguing that everything is wrong but you're not proposing an >

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
Also every piece of techincal information that describes a rowstore http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems Does it like this: 001:10,Smith,Joe,4; 002:12,Jones,Mary,5; 003:11,Johnson,Cathy

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
with Cassandra you can have >> everything in 1 node, which means there is only 1 partition and no >> different to 1 instance of sql server. Where you win is when you need to >> add 2 more nodes, Cassandra makes this easier whereas with SqlServer and >> Oracle you have to do a

Re: Cassandra data model right definition

2016-10-01 Thread Edward Capriolo
ke a case where the user is attempting to write and deleting 1 row and 1 column 6 billion times a day. Then you end up explaining to them http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached and how the cassandra storage model is not "like a relat

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
I can iterate over JSON data stored in mongo and present it as a table with rows and columns. It does not make mongo a rowstore. On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo wrote: > The problem with calling it a row store: > > https://en.wikipedia.org/wiki/Row_(database) > >

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
The problem with calling it a row store: https://en.wikipedia.org/wiki/Row_(database) In the context of a relational database , a *row*—also called a record or tuple

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
Then: Physically: A data store which physically structured-log-merge of SSTables (see) https://cloud.google.com/bigtable/. Now: One of the change made in Apache Cassandra 3.0 is a relatively important refactor of the storage engine . I say refac

Re: Way to write to dc1 but keep data only in dc2

2016-09-29 Thread Edward Capriolo
You can do something like this, though your use of terminology like "queue" really do not apply. You can setup your keyspace with replication in only one data center. CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc2' : 3 }; This will make the NTSkeyspace

Re: TRUNCATE throws OperationTimedOut randomly

2016-09-28 Thread Edward Capriolo
Truncate does a few things (based on version) truncate takes snapshots truncate causes a flush in very old versions truncate causes a schema migration. In newer versions like cassandra 3.4 you have this knob. # How long the coordinator should wait for truncates to complete # (This can be mu

Re: Reproducing exception in cassandra for testing failover scenarios

2016-09-24 Thread Edward Capriolo
You can also look at ccmbridge and farsandra. Here is an example of bringing up an 8 node 3 datacenter cluster in a single unit test using farsandra. https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/ThreeDcTest.java On Sat, Sep 24, 2016 at 3:53 PM, Jonathan Haddad wrote: > H

Re: Lightweight tx is good enough to handle counter?

2016-09-23 Thread Edward Capriolo
This might help you: https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/CompareAndSwapTest.java It counts using lwt's with multiple threads. On Fri, Sep 23, 2016 at 2:31 PM, Jaydeep Chovatia < chovatia.jayd...@gmail.com> wrote: > Since SERIAL consistency is not supported for ba

Re: Upgrading from Cassandra 2.1.12 to 3.0.9

2016-09-23 Thread Edward Capriolo
To me clear about the mixed versions. You do not want to do it. Especially if the versions are very far apart. Typically you can not run repair in mixed versions. You can not do schema changes with mixed versions. Data files from new versions are not readable from old versions. Basically you only

Re: Partition size

2016-09-12 Thread Edward Capriolo
In US english it is also debatable over which words are profane. https://simple.wikipedia.org/wiki/Profanity Different words can be profanity to different people, and what words are thought of as profanity in English can change over time. Suggestion: https://www.youtube.com/watch?v=L0MK7qz13bU O

Re: select query returns wrong value if use DESC option

2014-03-13 Thread Edward Capriolo
Consider filing a jira. Cql is the standard interface to cassandra everything is heavily tested. On Thursday, March 13, 2014, Katsutoshi Nagaoka wrote: > Hi. > > I am using Cassandra 2.0.6 version. There is a case that select query returns wrong value if use DESC option. My test procedure is as fo

Re:

2014-03-12 Thread Edward Capriolo
That is too much ram for cassandra make that 6g to 10g. The uneven perf could be because your requests do not shard evenly. On Wednesday, March 12, 2014, Batranut Bogdan wrote: > Hello all, > > The environment: > > I have a 6 node Cassandra cluster. On each node I have: > - 32 G RAM > - 24 G RAM

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
This brainstorming idea has already been -1 ed in jira. ROFL. On Wed, Mar 12, 2014 at 12:26 PM, Tupshin Harper wrote: > OK, so I'm greatly encouraged by the level of interest in this. I went > ahead and created https://issues.apache.org/jira/browse/CASSANDRA-6846, > and will be starting to look

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
vailable. I do think it can be integrated further to make >>>> moderate to complex queries easier and probably faster. That's why we built >>>> our own JPA-like object query API. I would love to see Cassandra get to the >>>> point where users can defin

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
sure about join though.. >>>> >>>> >>>> On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin wrote: >>>> >>>>> >>>>> Hi Ed, >>>>> >>>>> I agree Solr is deeply integrated into DSE. I've l

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt > > wrote: >> >>> Okay, I'm officially lost on this thread. If you plan on forking >>> Cassandra to preserve and continue to enhance the Thrift interface, you >>> would also want to add a bunch of relational

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
hread. If you plan on forking >> Cassandra to preserve and continue to enhance the Thrift interface, you >> would also want to add a bunch of relational features to CQL as part of >> that same fork? >> >> >> On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo >

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
ade studying > and implementing query optimizers. All of these things can be done, it's > just a matter of people finding the time to do it. > > > > > On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo wrote: > >> Peter, >> >> My advice. Do not bother.

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
Peter, My advice. Do not bother. I have become very active recently in attempting to add features to thrift. I had 4 open tickets I was actively working on. (I even found two bugs in the Cassandra in the process). People were aware of this and still called this vote. Several commit people have vo

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
ach method for this small use case. Otherwise I would take that up. On Tue, Mar 11, 2014 at 12:07 PM, Peter Lin wrote: > > if I have time this summer, I may work on that, since I like having thrift. > > > On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo > wrote: > >>

  1   2   3   4   5   6   7   8   9   >