Re: Network transfer to one node twice as others
Totally depends on the load balancing policy of your driver, your data model, consistently level and you’re replication factor. The default token aware policy for the DataStax java driver up to 2.1.4 and 2.0.9 would largely behave this way if you combined it with a hot partition, and all other languages of the DataStax driver to my knowledge behave the same way today. If you’re using the DataStax driver go ahead and change the load balancing policy to something like DCAwareRoundRobin only and see if the result is different. Honestly this is probably not a question for the dev mailing list, or even the Cassandra user mailing list but likely for whatever driver you’re using. - Ryan > On Apr 22, 2015, at 9:16 AM, Anishek Agarwal wrote: > > Nope not using thrift > On 22-Apr-2015 7:24 pm, "Benedict Elliott Smith" > wrote: > >> If you're connecting via thrift, all your traffic is most likely being >> routed to just one node, which then communicates with the other nodes for >> you. >> >> On Wed, Apr 22, 2015 at 6:11 AM, Anishek Agarwal >> wrote: >> >>> Forwarding it here, someone with Cassandra internals knowledge can help >> may >>> be >>> >>> Additionally, i observe the same behavior for reads too where Network >> read >>> from one node is twice than other two.. >>> >>> >>> -- Forwarded message -- >>> From: Anishek Agarwal >>> Date: Tue, Apr 21, 2015 at 5:15 PM >>> Subject: Network transfer to one node twice as others >>> To: "u...@cassandra.apache.org" >>> >>> >>> Hello, >>> >>> We are using cassandra 2.0.14 and have a cluster of 3 nodes. I have a >>> writer test (written in java) that runs 50 threads to populate data to a >>> single table in a single keyspace. >>> >>> when i look at the "iftop" I see that the amount of network transfer >>> happening on two nodes is same but on one of the nodes its almost 2ice as >>> the other two, Any reason that would be the case ? >>> >>> Thanks >>> Anishek >>> >>
Re: Proposal: deprecate Thrift now and remove support in 4.0
+1 On Mon, Dec 28, 2015 at 12:26 PM, Jonathan Haddad wrote: > +1 > > On Mon, Dec 28, 2015 at 7:04 AM Dave Brosius > wrote: > > > +1 now > > > > On 12/28/2015 09:26 AM, Jonathan Ellis wrote: > > > Thrift has been officially frozen for almost two years and unofficially > > for > > > longer. [1] Meanwhile, maintaining Thrift support through changes like > > > 8099 has been a substantial investment. > > > > > > I propose deprecating Thrift now and removing support in 4.0, i.e. Nov > > 2016 > > > if tick-tock goes as planned. > > > > > > I note that only 7% of our survey respondents [2] are using > Thrift-only, > > > and and those users are often on old releases (1.1 and 1.2), i.e. > > unlikely > > > to upgrade to 4.x anyway. > > > > > > Another 20% of users are using a mix of Thrift and CQL. Some have been > > > unable to completely migrate because CQL doesn’t quite provide every > > > feature from Thrift. The last such outstanding issue is mixing static > > and > > > dynamic Thrift “columns” in a single table. We have an issue open to > > > address this [3]. > > > > > > I think it is reasonable to either deprecate Thrift immediately in 3.2 > or > > > to wait until 10857 is committed in 3.4. > > > > > > [1] > > > > > > http://mail-archives.apache.org/mod_mbox/cassandra-dev/201403.mbox/%3ccaldd-zim6knmr7f_zcpvpqk0b2g919tttathiuofnvlztaw...@mail.gmail.com%3E > > > > > > [2] > > > > > > https://docs.google.com/spreadsheets/d/1FegCArZgj2DNAjNkcXi1n2Y1Kfvf6cdZedkMPYQdvC0/edit#gid=0 > > > > > > [3] https://issues.apache.org/jira/browse/CASSANDRA-10857 > > > > > > > > -- Thanks, Ryan Svihla
Re: Modeling nested collection with C* 2.0
Ahmed, Just using text and serializing as Json is the easy way and a common approach. However, this list is for Cassandra commiter discussion, please be so kind as to use the regular user list for data modeling questions or for any future responses to this email thread. Regards, Ryan Svihla > On Jan 28, 2016, at 7:28 AM, Ahmed Eljami wrote: > > Hi, > > I need your help for modeling a nested collection with cassanrda2.0 (UDT no, > no fozen) > > My users table contains emails by type, each type of email contains multiple > emails. > > Example: > Type: pro. emails: {a...@mail.com, b...@mail.com ...} > > Type: private. emails: {c...@mail.com, d...@mail.com} > . > > The user table also contains addresses, address type with fields. > > Example: > > Type: Pro. address {Street= aaa, number = 123, apartment = bbb} > > Type: Private. address {Street = bbb, number = 123, apartment = kkk } > > I am looking for a solution to store all these columns in one table. > > Thank you.
Re: Missing rows while scanning table using java driver
Priyanka, This is a better question for the Cassandra user mailing list (cc’d above) which is where many experts in the use of Cassandra are subscribed, where as this list is more about improving or changing Cassandra itself. As to your issue, there can be many combined issues at once that are leading to this situation, can I suggest you respond on the user list with the following: - Keyspace (RF especially), data center and table configuration. - Any errors in the logs on the Cassandra nodes. Regards, Ryan Svihla > On Feb 2, 2016, at 4:58 AM, Priyanka Gugale wrote: > > I am using query of the form: select * from %t where token(%p) > %s limit > %l; > > where t=tablename, %p=primary key, %s=token value of primary key and l=limit > > -Priyanka > > On Mon, Feb 1, 2016 at 6:19 PM, Priyanka Gugale wrote: > >> Hi, >> >> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan >> a table as per suggestions given here >> <http://www.myhowto.org/bigdata/2013/11/04/scanning-the-entire-cassandra-column-family-with-cql/>, >> On running the code to fetch records from table, it fetches different >> number of records on each run. Some times it reads all records from table, >> and some times some records are missing. As I have observed there is no >> fixed pattern for missing records. >> >> I have tried to set consistency level to ALL while running select query >> still I couldn't fetch all records. Is there any known issue? Or am I >> suppose to do anything more than running simple "select" statement. >> >> Code snippet to fetch data: >> >> SimpleStatement stmt = new SimpleStatement(query); >> stmt.setConsistencyLevel(ConsistencyLevel.ALL); >> ResultSet result = session.execute(stmt); >> if (!result.isExhausted()) { >> for (Row row : result) { >> process(row); >> } >> } >> >> -Priyanka >>
Re: DSE Release Planned That Corresponds with Cassandra 3.2
Corry, This is the Cassandra developer mailing list aimed at contributors to the Cassandra code base (not all of whom work for DataStax for example). Can I suggest you contact DataStax and ask the same question? Regards, Ryan Svihla > On Feb 3, 2016, at 9:27 AM, Corry Opdenakker wrote: > > Hi everyone, > > Yesterday evening I installed DSE for the first time at my macbook, and I > must say that untill now it runs very fast with 0 users and 0 data records:) > > I was reading the release notes of C* 3.x and it reports "The Storage > Engine has been refactored", "Java 8 required", and the introduction of > "Materialized Views" as some of the most important changes. > When will there be a DSE version released that Corresponds with Cassandra > 3.2 or any other 3.x subversion? > If there isn't one planned yet then I'll probably switch to C* 3.2 before > starting development. > > I've searched at several places but I couldn't find a DSE release > announcement or product roadmap that covers this topic. > If anyone could mention the relevant page, then that would be great. > > Cheers, Corry
Re: DSE Release Planned That Corresponds with Cassandra 3.2
I’d just contact sa...@datastax.com instead. > On Feb 3, 2016, at 9:32 AM, Corry Opdenakker wrote: > > You are Right Ryan, but since there is no DSE mailinglist, I thought I > could ask it overhere. > Thanks anyway for the reply. > Regards, Corry > > On Wed, Feb 3, 2016 at 4:30 PM, Ryan Svihla wrote: > >> Corry, >> >> This is the Cassandra developer mailing list aimed at contributors to the >> Cassandra code base (not all of whom work for DataStax for example). Can I >> suggest you contact DataStax and ask the same question? >> >> >> Regards, >> >> Ryan Svihla >> >>> On Feb 3, 2016, at 9:27 AM, Corry Opdenakker wrote: >>> >>> Hi everyone, >>> >>> Yesterday evening I installed DSE for the first time at my macbook, and I >>> must say that untill now it runs very fast with 0 users and 0 data >> records:) >>> >>> I was reading the release notes of C* 3.x and it reports "The Storage >>> Engine has been refactored", "Java 8 required", and the introduction of >>> "Materialized Views" as some of the most important changes. >>> When will there be a DSE version released that Corresponds with Cassandra >>> 3.2 or any other 3.x subversion? >>> If there isn't one planned yet then I'll probably switch to C* 3.2 before >>> starting development. >>> >>> I've searched at several places but I couldn't find a DSE release >>> announcement or product roadmap that covers this topic. >>> If anyone could mention the relevant page, then that would be great. >>> >>> Cheers, Corry >> >> > > > -- > -- > Bestdata.be > Optimised ict > Tel:+32(0)496609576 > co...@bestdata.be > --
Re: Keyspaces not found in cqlsh
Kedar, I recommend asking the user list u...@cassandra.apache.org this list is for the development of cassandra and you're more likely to find someone on the user list who may have hit this issue. Curious issue though I haven't seen that myself. Regards, Ryan Svihla > On Feb 11, 2016, at 7:56 AM, kedar wrote: > > Dev Team, > > Need some help with a burning cqlsh issue > > I am using cqlsh 5.0.1 | Cassandra 2.1.2, recently we are unable to see > / desc keyspaces and query tables through cqlsh on either of the two nodes > > cqlsh> desc keyspaces > > > > cqlsh> use user_index; > cqlsh:user_index> desc table list_1_10; > > Keyspace 'user_index' not found. > cqlsh:user_index> > cqlsh> select * from system.schema_keyspaces; > Keyspace 'system' not found. > cqlsh> > We are running a 2 node cluster. The Python - Django app that inserts > data is running without any failure and system logs show nothing abnormal. > > ./nodetool repair on one node hasn't helped ./nodetool cfstats shows all > the tables too > > ls -l cassandra/data/* on each node: > > https://gist.github.com/anonymous/3dddbe728a52c07d7c52 > https://gist.github.com/anonymous/302ade0875dd6410087b > > > > > -- > Thanks, > Kedar Parikh > > > > > > > > >
Re: What is the best way to model this JSON ??
Lokesh, The modeling will change a bit depending on your queries, the rate of update and your tooling (Spring-data-cassandra makes a mess of updating collections for example). I suggest asking the Cassandra users mailing list for help since this list is for development OF Cassandra. > On Mar 28, 2016, at 11:09 AM, Lokesh Ceeba - Vendor > wrote: > > Hello Team, > How to design/develop the best data model for this ? > > > var json=[{ "id":"9a55fdf6-eeab-4c83-9c6f-04c7df1b3225", >"user":"ssatish", >"event":"business", >"occurredOn":"09 Mar 2016 17:55:15.292-0600", >"eventObject": >{ >"objectType":"LOAD", >"id":"12345", >"state":"ARRIVAL", >"associatedAttrs": >[ >{ > > "type":"location_id", >"value":"100" >}, >{ > > "type":"location_type", >"value":"STORE" >}, >{ > > "type":"arrival_ts", > > "value":"2015-12-12T10:10:10" >} >] > } }] > > > I've taken this approach : > > create type event_object_0328 > ( > Object_Type text, > Object_ID Int, > Object_State text > ) > ; > > > create table Events > ( > event_id timeuuid, > event_type text, > triggered_by text, > triggered_ts timestamp, > Appl_IDtext, > eventObjectfrozen, > primary key(event_id) > ) > ; > > Now I need to build the Associated Attributes (Highlighted above in JSON > text). The Associated Attributes can be very dynamic and shall come in any > (Key,Value) pair combination. > > > > > -- > Lokesh > > This email and any files transmitted with it are confidential and intended > solely for the individual or entity to whom they are addressed. If you have > received this email in error destroy it immediately. *** Walmart Confidential > ***
Consistency improvement idea for cutting down on the new user ramp up time.
I posted this Jira here already https://issues.apache.org/jira/browse/CASSANDRA-13315 but I wanted to toss it out there on the mailing list at the same time to get some broader feedback. I've been supporting users a few years and during that time I've had 2-5 conversations a week about what is the correct Consistency Level for a given user depending on their needs and use case. This usually takes quite awhile and pays my bills so I'm happy for the work, but it's occurred to me I see three very common patterns: 1. People want maximum availability and throughput and don't care if things are inconsistent for awhile 2. People want maximum consistency and even want to be able to 'read my writes' 3. People want a mix of both and can tolerate not being able to "read my writes" There are of course a long tail of weird combinations of CLs after that with advanced users, admittedly I often dig and find issues with the consistency in their thought process as well and so they're cutting across their own goals often, but I grant there are valid tradeoffs to be had outside of the 3 above. Toss in the idea of global versions of these three and you arguably come to 6. So the above Jira is may attempt to address this in a larger fashion. To summarize the Jira some: 1. remove the 'serial consistency' bucket it confuses everyone. likewise require a condition for inserts if SERIAL/LOCAL_SERIAL is used 2. have 3 new CLs that you set for both reads and writes EVENTUALLY_CONSISTENT (LOCAL_ONE), HIGHLY_CONSISTENT (LOCAL_QUORUM), TRANSACTIONALLY_CONSISTENT (LOCAL_SERIAL) this minimize the amount that people need to think about CL and what is the correct starting point for their first application, and would have prevented some awful escalations I've seen. (I'm open for other names and including global levels). 3. set CQLSH to HIGHLY_CONSISTENT by default, new sysadmins for Cassandra are often using CQLSH to spelunk for complaints or missing data and while those in the know set it higher when doing that, it's a frequent problem that the CL ONE default is a surprise to the new user. CLQSH is not a performance sensitive use often either and when it use the other CLs are there. The end goal whatever shape this takes is it should match up with people's expectations that are new to Cassandra more consistently and not require advanced learnings in distributed theory, when put this way, the correct CL choice takes seconds and is often self evident. -- Thanks, Ryan Svihla
Re: Code quality, principles and rules
Different DI frameworks have different initialization costs, even inside of spring even depending on how you wire up dependencies (did it use autowire with reflection, parse a giant XML of explicit dependencies, etc). To back this assertion up for awhile in that community benching different DI frameworks perf was a thing and you can find benchmarks galore with a quick Google. The practical cost is also dependent on the lifecycles used (transient versus Singleton style for example) and features used (Interceptors depending on implementation can get expensive). So I think there should be some quantification of cost before a framework is considered, something like dagger2 which uses codegen I wager is only a cost at compile time (have not benched it, but looking at it's feature set, that's my guess) , Spring I know from experience even with the most optimal settings is slower on initialization time than doing by DI "by hand" at minimum, and that can sometimes be substantial. On Mar 17, 2017 12:29 AM, "Edward Capriolo" wrote: On Thu, Mar 16, 2017 at 5:18 PM, Jason Brown wrote: > >> do we have plan to integrate with a dependency injection framework? > > No, we (the maintainers) have been pretty much against more frameworks due > to performance reasons, overhead, and dependency management problems. > > On Thu, Mar 16, 2017 at 2:04 PM, Qingcun Zhou > wrote: > > > Since we're here, do we have plan to integrate with a dependency > injection > > framework like Dagger2? Otherwise it'll be difficult to write unit test > > cases. > > > > On Thu, Mar 16, 2017 at 1:16 PM, Edward Capriolo > > wrote: > > > > > On Thu, Mar 16, 2017 at 3:10 PM, Jeff Jirsa wrote: > > > > > > > > > > > > > > > On 2017-03-16 10:32 (-0700), François Deliège < > franc...@instagram.com> > > > > wrote: > > > > > > > > > > To get this started, here is an initial proposal: > > > > > > > > > > Principles: > > > > > > > > > > 1. Tests always pass. This is the starting point. If we don't care > > > > about test failures, then we should stop writing tests. A recurring > > > failing > > > > test carries no signal and is better deleted. > > > > > 2. The code is tested. > > > > > > > > > > Assuming we can align on these principles, here is a proposal for > > their > > > > implementation. > > > > > > > > > > Rules: > > > > > > > > > > 1. Each new release passes all tests (no flakinesss). > > > > > 2. If a patch has a failing test (test touching the same code > path), > > > the > > > > code or test should be fixed prior to being accepted. > > > > > 3. Bugs fixes should have one test that fails prior to the fix and > > > > passes after fix. > > > > > 4. New code should have at least 90% test coverage. > > > > > > > > > First I was > > > > I agree with all of these and hope they become codified and > followed. I > > > > don't know anyone who believes we should be committing code that > breaks > > > > tests - but we should be more strict with requiring green test runs, > > and > > > > perhaps more strict with reverting patches that break tests (or cause > > > them > > > > to be flakey). > > > > > > > > Ed also noted on the user list [0] that certain sections of the code > > > > itself are difficult to test because of singletons - I agree with the > > > > suggestion that it's time to revisit CASSANDRA-7837 and > CASSANDRA-10283 > > > > > > > > Finally, we should also recall Jason's previous notes [1] that the > > actual > > > > test infrastructure available is limited - the system provided by > > > Datastax > > > > is not generally open to everyone (and not guaranteed to be > permanent), > > > and > > > > the infrastructure currently available to the ASF is somewhat limited > > > (much > > > > slower, at the very least). If we require tests passing (and I agree > > that > > > > we should), we need to define how we're going to be testing (or how > > we're > > > > going to be sharing test results), because the ASF hardware isn't > going > > > to > > > > be able to do dozens of dev branch dtest runs per day in its current > > > form. > > > > > > > > 0: https://lists.apache.org/thread.html/ > f6f3fc6d0ad1bd54a6185ce7bd7a2f > > > > 6f09759a02352ffc05df92eef6@%3Cuser.cassandra.apache.org%3E > > > > 1: https://lists.apache.org/thread.html/ > 5fb8f0446ab97644100e4ef987f36e > > > > 07f44e8dd6d38f5dc81ecb3cdd@%3Cdev.cassandra.apache.org%3E > > > > > > > > > > > > > > > Ed also noted on the user list [0] that certain sections of the code > > itself > > > are difficult to test because of singletons - I agree with the > suggestion > > > that it's time to revisit CASSANDRA-7837 and CASSANDRA-10283 > > > > > > Thanks for the shout out! > > > > > > I was just looking at a patch about compaction. The patch was to > > calculate > > > free space correctly in case X. Compaction is not something that > requires > > > multiple nodes to test. The logic on the surface seems simple: find > > tables > > > of similar size and select them and merge them. The reality is it turns
Re: CQL unit tests vs dtests
The standard reasoning for unit tests is specificity of errors. Well written tests suites tell you where you screwed up exactly just by the success and failure pattern, often cutting down the need for a debugger. System tests standard rational is validating these units are wired up correctly. Hence where the doubling up of having system tests and unit tests occasionally overlap code pathways is considered worth it. In my experience, unless everyone is on board with crafting isolated unit tests well however the promised Nirvana of rapid feedback, specific errors and proper test coverage never happens. On May 21, 2014 4:07 AM, "Sylvain Lebresne" wrote: > Just to be clear, I'm not strongly opposed to having CQL tests in the unit > tests suite per-se (I happen to find dtests easier to work with, probably > because I don't use debuggers, but I'm good with saying that this just mean > I'm crazy and shouldn't be taken into account). Having tests that are > intrinsically > the same kind of tests in two places bugs me a bit more however. We > currently > have all of our CQL tests in dtests ( > https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py) > and there is quite a bunch of them. Here we're talking about starting to > slowly > add the same kind of tests in the unit test suite. So do we start adding > every > new CQL test from now on in the unit tests? And what about the existing > tests? > > -- > Sylvain > > > > On Wed, May 21, 2014 at 4:51 AM, Brandon Williams > wrote: > > > On Tue, May 20, 2014 at 6:42 PM, Jonathan Ellis > wrote: > > > > > So my preferred approach is, unit test when possible without writing a > > lot > > > of scaffolding and mock superstructure. Mocking is your code telling > you > > to > > > write a system test. > > > > > > This. > > >
Re: Performance Difference between Batch Insert and Bulk Load
So there is a bit of a misunderstanding about the role of the coordinator in all this. If you use an UNLOGGED BATCH and all of those writes are in the same partition key, then yes it's a savings and acts as one mutation. If they're not however, you're asking the coordinator node to do work the client could do, and you're potentially adding an extra round hop on several of those transactions if that coordinator node does not happen to own that partition key (and assuming your client driver is using token awareness, as it is in recent versions of the DataStax Java Driver. This also says nothing of heap pressure, and the measurable effect of large batches on node performance is in practice a problem in production clusters. I frequently have had to switch people off using BATCH for bulk loading style processes and in _every_ single case it's been faster to use executeAsync..not to mention the cluster was healthier as a result. As for the sstable loader options since they all use the streaming protocol and as of today the streaming protocol will stream one copy to each remote nodes, that they tend to be slower than even executeAsync in multi data center scenarios (though in single data center they're faster options, that said..the executeAsync approach is often fast enough). This is all covered in a blog post https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e and the DataStax CQL docs also reference BATCH is not a performance optimization http://www.datastax.com/documentation/cql/3.1/cql/cql_using/useBatch.html In summary the only way UNLOGGED BATCH is a performance improvement over using async with the driver is if they're within a certain reasonable size and they're all to the same partition. On Mon, Dec 1, 2014 at 9:43 AM, Dong Dai wrote: > Thank a lot for the reply, Raj, > > I understand they are different. But if we define a Batch with UNLOGGED, > it will not guarantee the atomic transaction, and become more like a data > import tool. According to my knowledge, BATCH statement packs several > mutations into one RPC to save time. Similarly, Bulk Loader also pack all > the mutations as a SSTable file and (I think) may be able to save lot of > time too. > > I am interested that, in the coordinator server, are Batch Insert and Bulk > Loader the similar thing? I mean are they implemented in the similar way? > > P.S. I try to randomly insert 1000 rows into a simple table on my laptop > as a test. Sync Insert will take almost 2s to finish, but sync batch insert > only take like 900ms. It is a huge performance improvement, I wonder is > this expected? > > Also, I used CQLSStableWriter to put these 1000 insertions into a single > SSTable file, it costs around 2s to finish on my laptop. Seems to be pretty > slow. > > thanks! > - Dong > > > On Dec 1, 2014, at 2:33 AM, Rajanarayanan Thottuvaikkatumana < > rnambood...@gmail.com> wrote: > > > > BATCH statement and Bulk Load are totally different things. The BATCH > statement comes in the atomic transaction space which provides a way to > make more than one statements into an atomic unit and bulk loader provides > the ability to bulk load external data into a cluster. Two are totally > different things and cannot be compared. > > > > Thanks > > -Raj > > > > On 01-Dec-2014, at 4:32 am, Dong Dai wrote: > > > >> Hi, all, > >> > >> I have a performance question about the batch insert and bulk load. > >> > >> According to the documents, to import large volume of data into > Cassandra, Batch Insert and Bulk Load can both be an option. Using batch > insert is pretty straightforwards, but there have not been an ‘official’ > way to use Bulk Load to import the data (in this case, i mean the data was > generated online). > >> > >> So, i am thinking first clients use CQLSSTableWriter to create the > SSTable files, then use “org.apache.cassandra.tools.BulkLoader” to import > these SSTables into Cassandra directly. > >> > >> The question is can I expect a better performance using the BulkLoader > this way comparing with using Batch insert? > >> > >> I am not so familiar with the implementation of Bulk Load. But i do see > a huge performance improvement using Batch Insert. Really want to know the > upper limits of the write performance. Any comment will be helpful, Thanks! > >> > >> - Dong > >> > > > > -- [image: datastax_logo.png] <http://www.datastax.com/> Ryan Svihla Solution Architect [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> DataStax is the fastest, most scalable distribute
Re: Performance Difference between Batch Insert and Bulk Load
On Mon, Dec 1, 2014 at 1:52 PM, Dong Dai wrote: > Thanks Ryan, and also thanks for your great blog post. > > However, this makes me more confused. Mainly about the coordinators. > > Based on my understanding, no matter it is batch insertion, ordinary sync > insert, or async insert, > the coordinator was only selected once for the whole session by calling > cluster.connect(), and after > that, all the insertions will go through that coordinator. > That's all correct but what you're not accounting for is if you use a token aware client then the coordinator will likely not own all the data in a batch, ESPECIALLY as you scale up to more nodes. If you are using executeAsync and a single row then the coordinator node will always be an owner of the data, thereby minimizing network hops. Some people now stop me and say "but the client is making those hops!", and that's when I point out "what do you think the coordinator has to do", only you've introduced something in the middle, and prevent token awareness from doing it's job. The savings in latency are particularly huge if you use more than a consistency level one on your write. > If this is not the case, and the clients do more work, like distribute > each insert to different > coordinators based on its partition key. It is understandable the large > volume of UNLOGGED BATCH > will cause some bottleneck in the coordinator server. However, this should > be not hard to solve by distributing > insertions in one batch into different coordinators based on partition > keys. I will be curious why > this is not supported. > The coordinator node does this of course today, but this is the very bottleneck of which you refer. To do what you're wanting to do and make it work, you'd have to enhance the CLIENT to make sure that all the objects in that batch were actually owned by the coordinator itself, and if you're talking about parsing a CQL BATCH on the client and splitting it out to the appropriate nodes in some sort of hyper token awareness, then you're taking a server side responsibility (CQL parsing) and moving it to the client. Worse you're asking for a number of bugs to occur by moving CQL parsing to the client, IE do all clients handle this the same way? what happens to older thrift clients with batch?, etc, etc, etc. Final point, every time you do a batch you're adding extra load on the heap to the coordinator node that could be instead on the client. This cannot be stated strongly enough. In production doing large batches (say over 5k) is a wonderful way to make your node spend a lot of it's time handling batches and the overhead of that process. > > P.S. I have the asynchronous insertion tested, probably because my dataset > is small. Batch insertion > is always much better than async insertions. Do you have a general idea > how large the dataset should be > to reverse this performance comparison. > You could be in a situation where the node owns all the data, and so can respond quickly, so it's hard to say, you can see however as the cluster scales there is no way that a given node will own everything in the batch unless you've designed it to be that way, either by some token aware batch generation in the client or by only batching on the same partition key (strategy covered in that blog). PS Every time I've had a customer tell me batch is faster than async, it's been a code problem such as not storing futures for later, or in Python not using libev, in all cases I've gotten at least 2x speed up and often way more. > - Dong > > > On Dec 1, 2014, at 9:57 AM, Ryan Svihla wrote: > > > > So there is a bit of a misunderstanding about the role of the coordinator > > in all this. If you use an UNLOGGED BATCH and all of those writes are in > > the same partition key, then yes it's a savings and acts as one mutation. > > If they're not however, you're asking the coordinator node to do work the > > client could do, and you're potentially adding an extra round hop on > > several of those transactions if that coordinator node does not happen to > > own that partition key (and assuming your client driver is using token > > awareness, as it is in recent versions of the DataStax Java Driver. This > > also says nothing of heap pressure, and the measurable effect of large > > batches on node performance is in practice a problem in production > clusters. > > > > I frequently have had to switch people off using BATCH for bulk loading > > style processes and in _every_ single case it's been faster to use > > executeAsync..not to mention the cluster was healthier as a result. > > > > As for the sstable loader options since they all use the stream
Re: Performance Difference between Batch Insert and Bulk Load
mispoke "That's all correct but what you're not accounting for is if you use a token aware client then the coordinator will likely not own all the data in a batch" should just be "That's all correct but what you're not accounting for is the coordinator will likely not own all the data in a batch" Token awareness has no effect on that fact. On Tue, Dec 2, 2014 at 9:13 AM, Ryan Svihla wrote: > > > On Mon, Dec 1, 2014 at 1:52 PM, Dong Dai wrote: > >> Thanks Ryan, and also thanks for your great blog post. >> >> However, this makes me more confused. Mainly about the coordinators. >> >> Based on my understanding, no matter it is batch insertion, ordinary sync >> insert, or async insert, >> the coordinator was only selected once for the whole session by calling >> cluster.connect(), and after >> that, all the insertions will go through that coordinator. >> > > That's all correct but what you're not accounting for is if you use a > token aware client then the coordinator will likely not own all the data in > a batch, ESPECIALLY as you scale up to more nodes. If you are using > executeAsync and a single row then the coordinator node will always be an > owner of the data, thereby minimizing network hops. Some people now stop me > and say "but the client is making those hops!", and that's when I point out > "what do you think the coordinator has to do", only you've introduced > something in the middle, and prevent token awareness from doing it's job. > The savings in latency are particularly huge if you use more than a > consistency level one on your write. > > >> If this is not the case, and the clients do more work, like distribute >> each insert to different >> coordinators based on its partition key. It is understandable the large >> volume of UNLOGGED BATCH >> will cause some bottleneck in the coordinator server. However, this >> should be not hard to solve by distributing >> insertions in one batch into different coordinators based on partition >> keys. I will be curious why >> this is not supported. >> > > The coordinator node does this of course today, but this is the very > bottleneck of which you refer. To do what you're wanting to do and make it > work, you'd have to enhance the CLIENT to make sure that all the objects in > that batch were actually owned by the coordinator itself, and if you're > talking about parsing a CQL BATCH on the client and splitting it out to the > appropriate nodes in some sort of hyper token awareness, then you're taking > a server side responsibility (CQL parsing) and moving it to the client. > Worse you're asking for a number of bugs to occur by moving CQL parsing to > the client, IE do all clients handle this the same way? what happens to > older thrift clients with batch?, etc, etc, etc. > > Final point, every time you do a batch you're adding extra load on the > heap to the coordinator node that could be instead on the client. This > cannot be stated strongly enough. In production doing large batches (say > over 5k) is a wonderful way to make your node spend a lot of it's time > handling batches and the overhead of that process. > >> >> P.S. I have the asynchronous insertion tested, probably because my >> dataset is small. Batch insertion >> is always much better than async insertions. Do you have a general idea >> how large the dataset should be >> to reverse this performance comparison. >> > > You could be in a situation where the node owns all the data, and so can > respond quickly, so it's hard to say, you can see however as the cluster > scales there is no way that a given node will own everything in the batch > unless you've designed it to be that way, either by some token aware batch > generation in the client or by only batching on the same partition key > (strategy covered in that blog). > > PS Every time I've had a customer tell me batch is faster than async, it's > been a code problem such as not storing futures for later, or in Python not > using libev, in all cases I've gotten at least 2x speed up and often way > more. > > >> - Dong >> >> > On Dec 1, 2014, at 9:57 AM, Ryan Svihla wrote: >> > >> > So there is a bit of a misunderstanding about the role of the >> coordinator >> > in all this. If you use an UNLOGGED BATCH and all of those writes are in >> > the same partition key, then yes it's a savings and acts as one >> mutation. >> > If they're not however, you're asking the coordinator node to do work >> the
Re: streaming got stuck at bootstrapping process
Analia, I believe your issues is most likely related to a couple of known issues in 1.2.13 https://issues.apache.org/jira/browse/CASSANDRA-6648 https://issues.apache.org/jira/browse/CASSANDRA-6615 I'm not sure of the internals myself, but I've observed similar behavior on a number of 1.2.13 clusters. I suggest an upgrade. If you'd like to respond I suggest doing so on the user's mailing list and not the developers as this is for commiters. On Tue, Dec 9, 2014 at 10:58 AM, Analia Lorenzatto < analialorenza...@gmail.com> wrote: > > Hello, > > I have a cassandra cluster comprised of 5 nodes (version 1.2.13) configured > with vnodes. One of these is down. So I've followed these steps to replace > it: > > http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_replace_node_t.html > > It was nicely being bootstrapped, but after one day teh streaming got > stuck. I do not see timeout, heapsize, exceptions errors. I just see > messages like this: > > 10.x.y.1 = dead node > 10.x.y.2 = node to be bootstrapped > > INFO 21:59:46,875 InetAddress /10.x.y.1 is now DOWN > INFO 21:59:46,875 FatClient /10.x.y.1 has been silent for 3ms, > removing from gossip > INFO 22:00:47,118 Node /10.x.y.1 is now part of the cluster > INFO 22:00:47,119 InetAddress /10.x.y.1 is now UP > WARN 22:00:47,121 Not updating token metadata for /10.x.y.1 because I am > replacing it > INFO 22:00:47,121 Nodes /10.x.y.1 and server.dom.net/10.x.y.2 have the > same token -1067333895391196053. Ignoring /10.x.y.1 > > I am not sure if they are ok. > > The output of "nodetool netstats" shows me a big list of files to be > streamed with 0% progress. > > I am out of ideas.. the node got stuck streaming but it is not already > bootstrapped. I tried this process more than once with the same result. > Any suggestion? > > Thanks in advance! > > > -- > Saludos / Regards. > > Analía Lorenzatto. > > “It's possible to commit no errors and still lose. That is not weakness. > That is life". By Captain Jean-Luc Picard. > -- [image: datastax_logo.png] <http://www.datastax.com/> Ryan Svihla Solution Architect [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
Re: How to bulkload into a specific data center?
Just noticed you'd sent this to the dev list, this is a question for only the user list, and please do not send questions of this type to the developer list. On Thu, Jan 8, 2015 at 8:33 AM, Ryan Svihla wrote: > The nature of replication factor is such that writes will go wherever > there is replication. If you're wanting responses to be faster, and not > involve the REST data center in the spark job for response I suggest using > a cql driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the > spark cassandra connector here > https://github.com/datastax/spark-cassandra-connector ) . While write > traffic will still be replicated to the REST service data center, because > you do want those results available, you will not be waiting on the remote > data center to respond "successful". > > Final point, bulk loading sends a copy per replica across the wire, so > lets say you have RF3 in each data center that means bulk loading will send > out 6 copies from that client at once, with normal mutations via thrift or > cql writes between data centers go out as 1 copy, then that node will > forward on to the other replicas. This means intra data center traffic in > this case would be 3x more with the bulk loader than with using a > traditional cql or thrift based client. > > > > On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang wrote: > >> I set up two virtual data centers, one for analytics and one for REST >> service. The analytics data center sits top on Hadoop cluster. I want to >> bulk load my ETL results into the analytics data center so that the REST >> service won't have the heavy load. I'm using CQLTableInputFormat in my >> Spark Application, and I gave the nodes in analytics data center as >> Intialial address. >> >> However, I found my jobs were connecting to the REST service data center. >> >> How can I specify the data center? >> > > > > -- > > Thanks, > Ryan Svihla > > -- Thanks, Ryan Svihla
Re: How to bulkload into a specific data center?
The nature of replication factor is such that writes will go wherever there is replication. If you're wanting responses to be faster, and not involve the REST data center in the spark job for response I suggest using a cql driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the spark cassandra connector here https://github.com/datastax/spark-cassandra-connector ) . While write traffic will still be replicated to the REST service data center, because you do want those results available, you will not be waiting on the remote data center to respond "successful". Final point, bulk loading sends a copy per replica across the wire, so lets say you have RF3 in each data center that means bulk loading will send out 6 copies from that client at once, with normal mutations via thrift or cql writes between data centers go out as 1 copy, then that node will forward on to the other replicas. This means intra data center traffic in this case would be 3x more with the bulk loader than with using a traditional cql or thrift based client. On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang wrote: > I set up two virtual data centers, one for analytics and one for REST > service. The analytics data center sits top on Hadoop cluster. I want to > bulk load my ETL results into the analytics data center so that the REST > service won't have the heavy load. I'm using CQLTableInputFormat in my > Spark Application, and I gave the nodes in analytics data center as > Intialial address. > > However, I found my jobs were connecting to the REST service data center. > > How can I specify the data center? > -- Thanks, Ryan Svihla