Re: Staging website at cassandra.staged.apache.org
Thanks Mick, if there documentation somewhere on how we update the website ? A - Aaron Morton New Zealand @aaronmorton CEO Apache Cassandra Consulting http://www.thelastpickle.com On Tue, 21 Apr 2020 at 18:40, Mick Semb Wever wrote: > For our cassandra-website repository, any changes to our website can now > first be staged at https://cassandra.staged.apache.org/ > > The staged website comes from the content/ directory on the `asf-staging` > branch. > > regards, > Mick >
Re: [VOTE] Project governance wiki doc (take 2)
+1 - Aaron Morton New Zealand @aaronmorton CEO Apache Cassandra Consulting http://www.thelastpickle.com On Thu, 25 Jun 2020 at 19:46, Benedict Elliott Smith wrote: > The purpose of this document is to define only how the project makes > decisions, and it lists "tenets" of conduct only as a preamble for > interpreting the rules on decision-making. The authors' intent was to lean > on this to minimise the rigidity and prescriptiveness in the formulation of > the rules (so that we could e.g. use "reasonable" repeatedly, instead of > specifying precise expectations), in part because this is our first attempt > to codify such rules, and in part because rigidity can cause unnecessary > friction to a project that mostly runs smoothly. > > The document provides an avenue for resolving disputes in decision-making > when these assumptions on behaviour breakdown. However its scope definitely > isn't, at least in my opinion, addressing misbehaviour by individuals (i.e. > one of the serious breaches listed in part 5 of the Apache CoC), which it > seems to me you are addressing here? > > Since we reference the ASF CoC, and the ASF provides its own guide for > handling CoC complaints (including within projects), that applies to that > very CoC (and which you referenced), it's unclear to me what you're looking > for. Are you looking for a more project-specific CoC with different > guidelines for reporting? This is something you would be welcome to > undertake, and seek consensus for. > > > > > On 25/06/2020, 02:38, "Dinesh Joshi" wrote: > > > On Jun 24, 2020, at 6:01 PM, Brandon Williams > wrote: > > > > On Wed, Jun 24, 2020 at 5:43 PM Dinesh Joshi > wrote: > >> 1. How/Who/Where are we planning to deal with Code of Conduct > violations? I assume this should be private@ but the document does not > call it out as such. We should call it out explicitly as part of the PMC > responsibilities. We should also clarify how and where are CoC violations > against PMC members reported and handled? Should they go to ASF? > > > > I think if we assume good intent, this will be a non-issue. People > > may make mistakes, but I try to have faith they will realize them and > > act accordingly when told so without any need to escalate. > > We need to spell out in the document how and where the CoC violations > are reported irrespective of the role of the person in the community. This > is a critical point to address. ASF spells this out very clearly[1]. We > should have a similar statement in the Project Governance document, > otherwise it feels incomplete to me. > > Dinesh > > [1] > http://www.apache.org/foundation/policies/conduct.html#reporting-guidelines > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [VOTE] Release Apache Cassandra 2.1.8
> 2.1.8 release vote right on top of 2.1.6 and 2.1.7. I havent dug into the specific issues, but given the small list of changes and release velocity, those two older releases should probably be considered an "upgrade now" trigger with clients. Thanks for the heads up. Guess we should keep a list of this sort of thing somewhere. A ----- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On Mon, Jul 6, 2015 at 12:52 PM, Gary Dusbabek wrote: > +1 > > On Mon, Jul 6, 2015 at 12:04 PM, Jake Luciani wrote: > > > I propose the following artifacts for release as 2.1.8. > > > > sha1: db39257c34152f6ccf8d53784cea580dbfe1edad > > Git: > > > > > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.1.8-tentative > > Artifacts: > > > > > https://repository.apache.org/content/repositories/orgapachecassandra-1063/org/apache/cassandra/apache-cassandra/2.1.8/ > > Staging repository: > > > https://repository.apache.org/content/repositories/orgapachecassandra-1063/ > > > > The artifacts as well as the debian package are also available here: > > http://people.apache.org/~jake > > > > The vote will be open for 72 hours (longer if needed). > > > > [1]: http://goo.gl/BFYiEO (CHANGES.txt) > > [2]: http://goo.gl/24XaPp (NEWS.txt) > > >
Re: Problem while configuring key and row cache?
Use info…. $ bin/nodetool -h localhost info … Key Cache: size 672 (bytes), capacity 52428768 (bytes), 12 hits, 17 requests, 0.706 recent hit rate, 14400 save period in seconds Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/08/2012, at 5:18 PM, Amit Handa wrote: > Hi, > > Thanks Jonathan for your reply. > I modified key_cache_size_in_mb and row_cache_size_in_mb values inside > cassandra.yaml. but not able to see it's effect using command " *./nodetool > -h 107.108.189.212 cfstats*". Can u let me know how to verify that the > setting for key_cache_size and row_chache_size has taken place. > > With Regards, > Amit > > > On Tue, Aug 21, 2012 at 8:19 PM, Jonathan Ellis wrote: > >> setcachecapacity is obsolete in 1.1+. Looks like we missed removing >> it from nodetool. See >> http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 for >> background. >> >> (Moving to users@.) >> >> On Tue, Aug 21, 2012 at 8:19 AM, Amit Handa wrote: >>> I started exploring apache cassandra 1.1.3. I am facing problem with how >> to >>> improve performance of cassandra using caching configurations. >>> I tried setting following configurations: >>> >>> ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 25 0 >>> ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 0 25 >>> ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 25 >> 25 >>> ./nodetool -h 107.108.189.204 setcachecapacity DemoUser Users 444 444 >>> >>> >>> But when i am checking that this particula configuration are really been >>> configured using command: >>> ./nodetool -h 107.108.189.212 cfstats >>> >>> it's showing following results for keySpace DemoUser and column Family >>> Users: >>> *Keyspace: DemoUser >>>Read Count: 21914 >>>Read Latency: 0.08268495026010769 ms. >>>Write Count: 87656 >>>Write Latency: 0.06009481381765082 ms. >>>Pending Tasks: 0 >>>Column Family: Users >>>SSTable count: 1 >>>Space used (live): 1573335 >>>Space used (total): 1573335 >>>Number of Keys (estimate): 22016 >>>Memtable Columns Count: 0 >>>Memtable Data Size: 0 >>>Memtable Switch Count: 1 >>>Read Count: 21914 >>>Read Latency: 0.083 ms. >>>Write Count: 87656 >>>Write Latency: 0.060 ms. >>>Pending Tasks: 0 >>>Bloom Filter False Postives: 0 >>>Bloom Filter False Ratio: 0.0 >>>Bloom Filter Space Used: 41104 >>>Compacted row minimum size: 150 >>>Compacted row maximum size: 179 >>>Compacted row mean size: 179 * >>> >>> I am unable to see the effect of above setcachecapacity command. Let me >>> know how i can configure the cache capacity, and check it's effect. >>> >>> With Regards, >>> Amit >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >>
Re: Batch Truncate using Hector 1.0-5
The hector user list is the best place this question https://groups.google.com/forum/?fromgroups#!forum/hector-users Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/11/2012, at 8:53 AM, Amitabha Karmakar wrote: > Hi, > > Is there any way I could do a batch truncate using hector 1.0-5 ? > > Thanks !
Re: Proposal: require Java7 for Cassandra 2.0
+1 - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/02/2013, at 11:21 AM, Jonathan Ellis wrote: > Java 6 EOL is this month. Java 7 will be two years old when C* 2.0 > comes out (July). Anecdotally, a bunch of people are running C* on > Java7 with no issues, except for the Snappy-on-OS-X problem (which > will be moot if LZ4 becomes our default, as looks likely). > > Upgrading to Java7 lets us take advantage of new (two year old) > features as well as simplifying interoperability with other > dependencies, e.g., Jetty's BlockingArrayQueue requires java7. > > Thoughts? > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder, http://www.datastax.com > @spyced
Re: ApacheCon North America
I'll be there from the evening on the Wednesday 27th to Friday 1st midday. Talking on Thursday afternoon about C* internals. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 12/02/2013, at 4:26 AM, Eric Evans wrote: > Hi All > > It's now about 2 weeks until ApacheCon North America, which is taking > place Sunday 24th Feb - Thursday 28th in Portland. Quite a few people > from our project will be there, and we'd love to see you all! > > If you haven't already registered for the conference, then we've some > good news - we've managed to snag a 20% discount for you! To register > with the 20% off, use code PMC or the link > http://acna13.eventbrite.com/?discount=PMC > > To see what the talks are, including the ones relating to Cassandra, > please see the schedule -http://na.apachecon.com/schedule/ > > Would you like to get more involved in the project? A number of > people will be at the (Free!) Hackathon on the Monday. Ours will focus > on CQL drivers, but if you would like to learn more about > contributing, get some mentoring on a patch, or help collaborate on > some fixes, then by all means come join us. If you'd like to come, > whether you can make it to the main conference or not, the details are > on the ApacheCon wiki: http://wiki.apache.org/apachecon/HackathonNA13 > > Also talking of free, there will be a BarCamp on the Sunday. This is > open to everyone, Portland natives and conference-goers alike, and > should be a great chance to share new ideas and learn about existing + > upcoming projects. To sign up to come to that, or learn more, it's > http://wiki.apache.org/apachecon/BarCampApachePortland > > Hopefully see some of you in Portland in a few weeks! > --- > > Thanks > > -- > Eric Evans > Acunu | http://www.acunu.com | @acunu
Re: Rename failed while cassandra is starting up
Replying on the user group. - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 14/04/2013, at 3:50 PM, Boris Yen wrote: > Hi All, > > Recently, we encountered an error on 1.0.12 that prevented cassandra from > starting up. From the log messages, it looked like the table/keyspace was > opened before the scrubDataDirectories was executed. This created a race > condition between two threads. One was trying to rename files while the > other was trying to remove tmp files. I was wondering if anyone could > provide us some information or workaround for this. > > INFO [MemoryMeter:1] 2013-04-09 02:49:39,868 Memtable.java (line 186) > CFS(Keyspace='fmzd', ColumnFamily='alarm.fmzd_alarm_category') liveRatio is > 3.7553409423470883 (just-counted was 3.1413828689370487). calculation took > 2ms for 265 columns > INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,868 SSTableReader.java (line > 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-2 (83 bytes) > INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,868 SSTableReader.java (line > 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshRole-hd-1 (123 bytes) > INFO [Creating index: alarm.fmzd_alarm_category] 2013-04-09 02:49:39,874 > ColumnFamilyStore.java (line 705) Enqueuing flush of > Memtable-alarm.fmzd_alarm_category@413535513(14025/65835 serialized/live > bytes, 275 ops) > INFO [OptionalTasks:1] 2013-04-09 02:49:39,877 SecondaryIndexManager.java > (line 184) Creating new index : ColumnDefinition{name=6d65736853534944, > validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, > index_name='fmzd_ap_meshSSID'} > INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,895 SSTableReader.java (line > 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-1 (122 bytes) > INFO [SSTableBatchOpen:2] 2013-04-09 02:49:39,896 SSTableReader.java (line > 153) Opening /test/db/data/fmzd/ap.fmzd_ap_meshSSID-hd-2 (82 bytes) > INFO [OptionalTasks:1] 2013-04-09 02:49:39,900 SecondaryIndexManager.java > (line 184) Creating new index : > ColumnDefinition{name=6d6f62696c6974795a6f6e654944, > validator=org.apache.cassandra.db.marshal.UTF8Type, index_type=KEYS, > index_name='fmzd_ap_mobilityZoneUUID'} > ERROR [FlushWriter:1] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java > (line 139) Fatal exception in thread Thread[FlushWriter:1,5,main] > java.io.IOError: java.io.IOException: rename failed of > /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db > at > org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:375) > at > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319) > at > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302) > at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:276) > at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) > at org.apache.cassandra.db.Memtable$4.runMayThrow(Memtable.java:299) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.io.IOException: rename failed of > /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-hd-21-Data.db > at > org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.java:355) > at > org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:371) > ... 9 more > INFO [SSTableBatchOpen:1] 2013-04-09 02:49:39,917 SSTableReader.java (line > 153) Opening /test/db/data/fmzd/ap.fmzd_ap_mobilityZoneUUID-hd-1 (312 bytes) > INFO [FlushWriter:2] 2013-04-09 02:49:39,916 Memtable.java (line 246) > Writing Memtable-alarm.fmzd_alarm_alarmCode@402202831(2958/22542 > serialized/live bytes, 58 ops) > ERROR [main] 2013-04-09 02:49:39,916 AbstractCassandraDaemon.java (line > 373) Exception encountered during startup > java.io.IOError: java.io.IOException: Failed to delete > /test/db/data/fmzd/alarm.fmzd_alarm_alarmCode-tmp-hd-21-Statistics.db > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:372) > at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:415) > at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:193) > at > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:356) > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) > Caused by: java.io.IOException: Failed to delete > /test/db/data/fmzd/alarm.fmzd_ala
wiki access
Hi my wiki access has somehow died, my user name is aaronmorton. Could you please reset my password or generate a new account. Thanks Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: wiki access
It was the case sensitivity. Weird because I was in 1Password. In now, thanks. Cheers Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 30/05/2014, at 6:58 pm, Jonathan Ellis wrote: > Is it case sensitive? We have you as AaronMorton. > > We can whitelist a new account if you create one. > > On Fri, May 30, 2014 at 5:25 AM, Aaron Morton wrote: >> Hi my wiki access has somehow died, my user name is aaronmorton. >> >> Could you please reset my password or generate a new account. >> >> Thanks >> Aaron >> >> - >> Aaron Morton >> New Zealand >> @aaronmorton >> >> Co-Founder & Principal Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder, http://www.datastax.com > @spyced
Re: Hadoop package exposed through thrift
I'm not up to speed with Hadoop in Cassandra, but regular Hadoop provides a IO stream interface so it can be used with non Java languages. http://hadoop.apache.org/common/docs/r0.15.2/streaming.html That may be of help. Aaron On 9 Jun 2010, at 09:53, Jeremy Hanna wrote: > I just didn't know if there were any way to make it easier for the non-java > crowd to take advantage of it. I'll give it some more thought. > > On Jun 8, 2010, at 4:05 PM, Jonathan Ellis wrote: > >> exposing it through thrift would mean the path would be >> >> client >> to cassandra [processing thrift command] >> to hadoop [giving it a job] >> to cassandra [fetching the data] >> to hadoop [m/r] >> to cassandra [handing result back] >> to client >> >> it just doesn't seem like a good design to me. >> >> additionally, thrift is meant more for "stuff your app is doing >> constantly" while hadoop handles analytics queries. this separation >> of duties makes a lot of sense to me. >> >> On Tue, Jun 8, 2010 at 1:45 PM, Jeremy Hanna >> wrote: >>> When I gave a presentation on cassandra+hadoop, some ruby folks were >>> wondering about the possibility of using the MapReduce functionality in a >>> language other than Java. >>> >>> I was just wondering if any thought was given to exposing the >>> org.apache.cassandra.hadoop functionality through thrift. That way the >>> MapReduce code could be used by several languages and secondarily by client >>> authors. >>> >>> I'm just trying to see if there is any reason why it wasn't exposed through >>> thrift or if more needs to be done before it could be exposed to languages >>> other than Java. >>> >>> Thanks, >>> >>> Jeremy >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >
Re: Secondary indexing and 0.6/0.7 integration with Datanucleus
I've not read up on the secondary indexes, but am doing some thing similar. I got some inspiration from the Lucandra project. You will probably need to make multiple calls to the cassandra for each clause of your query. The design I used had two CF's rough idea was; in the TermDocIndex the key term (e.g. lastName=Smith) and the column names are the keys for the object / document the term is from e.g. key1. The DocTermIndex uses the object/doc id as the key and has columns for each term the document contains, e.g. "lastname=Smith"). I also maintained some stats on how many objects/documents had the term (using redis, will move to cassandra counters in 0.7 perhaps). The query process then becomes. 1. Determine the most selective term in the query using the stats 2. Do a get_slice to get the first X (1000 perhaps) column values from the TermDocIndex using the term key. 3. Use the keys from step 2 in a multi_get_slice against the DocTermIndex, passing the list of keys from 2 and listing the remaining terms as the column names you want to get back. 4. From the result of 3 filter all keys that returned less columns that we asked for. 5. Repeat from 3 if needed. I was hoping the limit in step 2 would bound the queries into the cluster, and the multiget in step 3 would be better at distributing the most of the work around the cluster. E.g. rather than reading 1000 columns from, say, 3 keys. It reads 3 columns from 1000 keys. Aaron On 16 Jun 2010, at 16:57, Todd Nine wrote: > No problem, > I didn't want to implement my own solution if an existing one could > easily be applied. Since I'll be creating CF that represent secondary > indexes, I'll need to perform range scans over the keys of those > secondary index CFs. The column names within the CF's are the row keys > of the primary table. Is there a way I can get the intersection of all > of the column names from multiple ranges scans over different column > families in one result set? Otherwise I'll need to make multiple trips > and create the intersection myself in my plugin. Here is an example of > what I'm trying to do. > > CF: Person > > key1: { > firstName: John > lastName: Smith > email: smi...@foo.com > } > > key2: { > firstName: Jane > lastName: Smith > email: smi...@foo.com > } > > key3: { > firstName: Jane > lastName: Doe > email: smi...@foo.com > } > > > My secondary index tables would be the following > > CF: Person_LastName > > Smith:{ > key1: 0x00 > key2: 0x00 > } > > Doe: { > key3:0x00 > } > > CF: Person_Email > smi...@foo.com:{ >key1:0x00 >key2:0x00 >key3:0x00 > } > > If my input is something similar to lastName == 'Smith' && email == > "smi...@foo.com", I would return all columns from key "Smith" in CF > Person_LastName, and all columns from key "smi...@foo.com" in CF > Person_Email. The intersection of the two sets is key1, and key2, and > have cassandra only return those rows. > > Thanks, > Todd > > > > > > On Tue, 2010-06-15 at 23:38 -0500, Jonathan Ellis wrote: > >> No chance that 749 can be backported to 0.6, sorry. >> >> On Tue, Jun 15, 2010 at 10:35 PM, Todd Nine wrote: >> >>> Lets try that again. >>> >>> This is the intended issue. >>> >>> https://issues.apache.org/jira/browse/CASSANDRA-749 >>> >>> thanks, >>> Todd >>> >>> >>> >>> On Tue, 2010-06-15 at 20:02 -0500, Jonathan Ellis wrote: >>> >>> What issue were you trying to link? :) >>> >>> On Tue, Jun 15, 2010 at 6:56 PM, Todd Nine wrote: Hi all, I'm implementing a Datanucleus plugin for Cassandra. I'm finished with the basic functionality, and everything seems to work pretty well. Now my issue is performing secondary indexing on fields within my data. I have outlined some of the issues I'm facing in this post. http://www.datanucleus.org/servlet/forum/viewthread_thread,6087_lastpage,yes#32610 Essentially, for each operand the user specifies, I will need to make a trip to Cassandra, load the key columns, then perform an intersection with the result from my previous read. Eventually at the end of all the intersections, I will have a list of keys I will then load. This obviously requires several trips to Cassandra, where from my understanding of secondary indexing, I would only need to make one trip for multiple operands over a column family.I've read over this issue. http://issues.apache.org/jira/browse/CASSANDRA-32610 And it seems to solve a lot of my woes. Is it possible/recommended to patch the current code base of 0.6.2 to perform this functionality? Thanks, Todd >>> >>> >>> >>> >>> >> >>
Re: Atomic Compare and Swap
I've been playing with something like CAS, it's not the same but it may be of interest. I write some data into Cassandra with quorum or better consistency, that allows me to assert what it should look like when read back. If the assertion holds I can then go ahead. For example, in a CF with Time uuid ordering the client writes a column against the key of the thing we want to update. This write does not store the value. Then read back the first ordered column, if it's name is my uuid then I can proceed. Otherwise delete the column. If you know the uuid of the last update you can read back two columns. Then assert your the first and the previous is the second. Perhaps if you were doing a CAS you could then write then actual value you want to update and somehow store the uuid from above with it. Say as col in another col family with name as the uuid and value as the value. To read get the first colum from both CFs as a multi get, the col names must match from both cols for the value to be correct. (could just use two diff keys in same CF) Hope that makes sense. Aaron On 23/06/2010, at 4:27 PM, Mike Malone wrote: I'd be interested in what the folks who want CAS implementations think about vector clocks. Can you use them to fulfill your use cases? If not, why not? I ask because I have found myself wanting CAS in Cassandra too, but I think that's only because I'm pretty familiar with HTTP. I think vector clocks with client merge give you essentially the same functionality, but in a way that fits much more nicely with the rest of the Cassandra architecture. CAS really exacerbates Cassandra's weaknesses. Mike On Tue, Jun 22, 2010 at 4:52 PM, Rishi Bhardwaj wrote: S>: An *atomic* CAS is another beast and I see at least two difficulties: S>: 1) making it atomic locally: Cassandra's implementation is very much multi-threaded. On a given node, while you're reading-comparing-and-swapping on some column c, no other thread should be allowed to write c (even 'normal' write). You would probably need to have specific column families where CAS is allowed and for which all writes would be slower (since some locking would be involved). Even then, making such locking efficient and right is not easy. But in the end, local atomicity is quite probably the easy part. R: I am curious as to how does Cassandra handle two concurrent writes to the same column right now? Is there any locking on the write path to serialize two writes to the same column? If there is any locking then CAS can build on that. If there is no such locking then we could exclude normal writes from the synchronization/locking required for CAS. So the normal write path remains the same, and we let the client know that atomic CAS wouldn't work if normal writes are also happening on the same column values. In short a client should not mix normal writes with Atomic CAS for writing some column value. This will hopefully make things simpler. S:>2) making it atomic cluster-wide: data is replicated and an atomic CAS would need to apply on the exact same column version in every node. Which, with eventual consistency especially, is pretty hard to accomplish unless you're locking the cluster (but that's what Cages/ZK do). R: For starters it would be great if atomic CAS could work for consistency level Quorum and ALL and not be supported for other consistency levels. Even for other consistency levels what would stop CAS to work? Why would one require cluster wide locking? I might be mistaken here but the atomic CAS operation would happen individually at all the replica nodes (either directly or through hinted writes) and would succeed or fail depending on the timestamp/version of the column at the replica. If we do Quorum reads and CAS writes then we can also be sure about consistency. S:>That being said, if you have a neat solution for efficient and distributed atomic CAS that doesn't require rewriting 80% of Cassandra, I'm sure there will be interest in that. R: That sounds great. I am definitely going to look into this and report back if I have a good solution. Thanks, Rishi From: Sylvain Lebresne To: dev@cassandra.apache.org Sent: Tue, June 22, 2010 1:21:51 AM Subject: Re: Atomic Compare and Swap On Mon, Jun 21, 2010 at 11:19 PM, Rishi Bhardwaj > wrote: I have read the post on cages and it is definitely very interesting. But cages seems to be too coarse grained compared to an Atomic Compare and Swap on Cassandra column value. Cages would makes sense when one wants to do multiple atomic row, column updates. Also, I am not so sure about the scalability when it comes to using zookeeper for keeping locks on Cassandra columns... there would also be performance hit with an added RPC for every write. I feel Cages maybe fine for systems when one has few locks but I feel an atomic CAS in
Re: Cassandra and Lucene
You may need to provide a some more information. What's the cluster configuration, what version, what's in the logs etc. AaronOn 24 Jul, 2010,at 03:40 AM, Michelan Arendse wrote:Hi I have recently started working on Cassandra as I need to make a distribute Lucene index and found that Lucandra was the best for this. Since then I have configured everything and it's working ok. Now the problem comes in when I need to write this Lucene index to Cassandra or convert it so that Cassandra can read it. The test index is 32 gigs and i find that Cassandra times out alot. What happens can't Cassandra take that load? Please any help will be great. Kind Regards,
Re: Cassandra and Lucene
Sorry, also moving to User list. AaronOn 26 Jul, 2010,at 12:14 PM, Aaron Morton wrote:You may need to provide a some more information. What's the cluster configuration, what version, what's in the logs etc. AaronOn 24 Jul, 2010,at 03:40 AM, Michelan Arendse wrote:Hi I have recently started working on Cassandra as I need to make a distribute Lucene index and found that Lucandra was the best for this. Since then I have configured everything and it's working ok. Now the problem comes in when I need to write this Lucene index to Cassandra or convert it so that Cassandra can read it. The test index is 32 gigs and i find that Cassandra times out alot. What happens can't Cassandra take that load? Please any help will be great Kind Regards,
Re: Having Problems installing Chiton
You need to have the python thrift client and the generated cassandra thrift library in the python path. To get the thrift library I followed this guide http://wiki.apache.org/cassandra/InstallThrift There may be an easier way though. It looks like the Telephus client includes the cassandra package but not the thrift package. AaronOn 24 Aug, 2010,at 05:42 PM, durga devi wrote:Sir/Madam, I am new to Ubuntu. I am getting the following Problem when insatlling the chiton in ubuntu 10.4 From this link http://tinyurl.com/24gdgkv I set the PYHTONPATH as export PYTHONPATH=/home/durga/driftx- Telephus-fb32fc7/:/home/durga/driftx-chiton-bd91965/:/usr/bin/python/ And i run the /driftx-chiton-bd91965/bin/./chiton-client I had the following Problem http://pastebin.com/29T12wef I am unble to sort out where this problem is occurring while in installation. Thanks & Regards, B.Durgadevi
Re: Build an index to join two CFs
I cannot tell you where in the code to make these changes. But it sounds like you want to fork cassandra and turn it into a RDBMS. It would undoubtedly be easier to just use a RDBMS. Rather than have two CF's, address and name, just have one for the person using a super CF. Pull back the entire row for the id. Denormalise your data so the query is answered by one slice request to one CF, then you do not need joins. If you want some advice on the data model, move the discussion to the user list. Aaron On 11 Sep 2010, at 09:01, Alvin UW wrote: > Hello, > > I am going to build an index to join two CFs. > First, we see this index as a CF/SCF. The difference is I don't materialise > it. > Assume we have two tables: > ID_Address(*Id*, address) , Name_ID(*name*, id) > Then,the index is: Name_Address(*name*, address) > > When the application tries to query on Name_Address, the value of "name" is > given by the application. > I want to direct the read operation to Name_ID to get "Id" value, then go > to ID_Address to > get the "address" value by the "Id" value. So far, I consider only the read > operation. > By this way, the join query is transparent to the user. > > So I think I should find out which methods or classes are in charge of the > read operation in the above operation. > For example, the operation in cassandra CLI "get > Keyspace1.Standard2['jsmith']" calls exactly which methods > in the server side? > > I noted CassandraServer is used to listen to clients, and there are some > methods such as get(), get_slice(). > Is it the right place I can modify to implement my idea? > > Thanks. > > Alvin
system tests on osx
Anyone had trouble running the test/system/test_thrift_server.py tests on a mac book ? I was trying last night and they would sometimes work, sometimes not, without me making any changes They were failing with errors such as connection reset, TSocket read 0 bytes errors at different times. I've been able to run them at work (Ubuntu 0.4) OK. Just wanted to check if there were any known issues before I spend more time digging into it. ThanksAaron
Re: Help on dynamic creation of CF
Moving to the User List Aaron On 13 Oct 2010, at 18:44, gagandip Singh wrote: > I am also new to the Cassandra world but I think that is not possible on 0.6 > version. This is feature is provided in 0.7 version which is in beta right > now. You can download it from Cassandra site. > > Thanks, > Gagan > > On Wed, Oct 13, 2010 at 11:05 AM, Wicked J wrote: > >> Hi, >> I'm using Cassandra v0.6.4 and wondering how can my app. dynamically create >> Column Families? >> >> Thanks! >>
/var/tmp in FailureDetector
I was reading through some code and noticed the following in FailureDetector.dumpInterArrivealTimes() FileOutputStream fos = new FileOutputStream("/var/tmp/output-" + System.currentTimeMillis() + ".dat", true); If this is meant to be cross platform I'm happy to create a bug and change it to use File.createTempFile() . Also I could not find any use of the dumpInterArrivalTimes(InetAddress ep) overload. Anyone know if it should be kept? thanks Aaron
Re: /var/tmp in FailureDetector
I should have mentioned the FailureDetectorMBean only has the parameterless dumpInterArrivalTimes(). The overload that takes InetAddress is not available through JMX. A On 21 Oct 2010, at 01:55, Gary Dusbabek wrote: > Yes, we should generate it in the right temp directory. That method > is an implementation of an interface method (FailureDetectorMBean), > meant to be invoked by JMX, which is why no other code calls it. > > Gary. > > On Wed, Oct 20, 2010 at 03:48, aaron morton wrote: >> I was reading through some code and noticed the following in >> FailureDetector.dumpInterArrivealTimes() >> >>FileOutputStream fos = new FileOutputStream("/var/tmp/output-" + >> System.currentTimeMillis() + ".dat", true); >> >> If this is meant to be cross platform I'm happy to create a bug and change >> it to use File.createTempFile() . >> >> Also I could not find any use of the dumpInterArrivalTimes(InetAddress ep) >> overload. Anyone know if it should be kept? >> >> thanks >> Aaron >> >>
Question about ColumnFamily Id's
I was helping a guy who in the end had a mixed beta1 and beta2 cluster http://www.mail-archive.com/u...@cassandra.apache.org/msg06661.htmlI had a look around the code and have a couple of questions, just for my understanding. When ReadResponseSerialize is called to deserialize the response from a node, it calls the RowSerializer which uses the ColumnFamilySerializer. If the CfId in the row is not known on the node a UnserializableColumnFamilyException is thrown. It's an IOException sub class and the error is treated as an Internal Error by the thrift generated Cassandra server. The read message sent to the node contains the Keyspace+CF names, and it returns it's CfID in the response. It looks like if a node somehow has a different/bad schema it can cause reads to fail. Is this correct? Could it's response be ignored if the read still meets the CL?Next question was how nodes could ever get to have a different CfId for the same Keyspace+CF pair? It looks like the the CfId is never changed, so it would only happen if two node were each given a schema update and could not communicate it with each other.Am guessing the whole scenario is "unsupported" just trying to understand whats happening. ThanksAaron
Re: /var/tmp in FailureDetector
To quick for me :) Aaron On 21 Oct 2010, at 17:52, Jonathan Ellis wrote: > Done in r1025822 > > On Wed, Oct 20, 2010 at 12:54 PM, Gary Dusbabek wrote: >> You're right! It looks like dead code that should be removed. >> >> Gary. >> >> On Wed, Oct 20, 2010 at 12:50, aaron morton wrote: >>> I should have mentioned the FailureDetectorMBean only has the parameterless >>> dumpInterArrivalTimes(). >>> >>> The overload that takes InetAddress is not available through JMX. >>> >>> A >>> On 21 Oct 2010, at 01:55, Gary Dusbabek wrote: >>> >>>> Yes, we should generate it in the right temp directory. That method >>>> is an implementation of an interface method (FailureDetectorMBean), >>>> meant to be invoked by JMX, which is why no other code calls it. >>>> >>>> Gary. >>>> >>>> On Wed, Oct 20, 2010 at 03:48, aaron morton >>>> wrote: >>>>> I was reading through some code and noticed the following in >>>>> FailureDetector.dumpInterArrivealTimes() >>>>> >>>>>FileOutputStream fos = new FileOutputStream("/var/tmp/output-" >>>>> + System.currentTimeMillis() + ".dat", true); >>>>> >>>>> If this is meant to be cross platform I'm happy to create a bug and >>>>> change it to use File.createTempFile() . >>>>> >>>>> Also I could not find any use of the dumpInterArrivalTimes(InetAddress >>>>> ep) overload. Anyone know if it should be kept? >>>>> >>>>> thanks >>>>> Aaron >>>>> >>>>> >>> >>> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com
questions about SSTableExport/Import
I was trying to help this guy http://www.mail-archive.com/u...@cassandra.apache.org/msg07297.html who seemed to have troubles loading a json file. And I started taking a look at SSTableExport and SSTableImport. SSTableExport does not encode any information about the Column sub type (ExpiringColumn or DeletedColumn). It records isMarkedForDelete(), the timestamp and the localDeletionTime as the col value if its a DeletedColumn. SSTableImport then calls either cf.addColumn() or cf.addTombstone() based on the deleted flag. First question is is the code in SSTableImport.addToStandardCF() correct to call cf.addColumn() if when the column was serialised it was isMarkedForDelete() ? Next is it OK to lose the fact that a column is an ExpiringColumn (and its ttl) when it's exported to json? On my local machine I modified the unit test for SSTableExport as below and the assertion that the col was not returned failed. Thanks Aaron diff --git a/test/unit/org/apache/cassandra/tools/SSTableExportTest.java b/test/unit/org/apache/cassandra/tools/SSTableExportTest.java index 6f79f62..53d2a9c 100644 --- a/test/unit/org/apache/cassandra/tools/SSTableExportTest.java +++ b/test/unit/org/apache/cassandra/tools/SSTableExportTest.java @@ -179,6 +179,7 @@ public class SSTableExportTest extends SchemaLoader // Add rowA cfamily.addColumn(new QueryPath("Standard1", null, ByteBufferUtil.bytes("name")), ByteBufferUtil.bytes("val"), 1); +cfamily.addColumn(new QueryPath("Standard1", null, ByteBufferUtil.bytes("ttl")), ByteBufferUtil.bytes("val"), 1, 1); writer.append(Util.dk("rowA"), cfamily); cfamily.clear(); @@ -187,6 +188,15 @@ public class SSTableExportTest extends SchemaLoader writer.append(Util.dk("rowExclude"), cfamily); cfamily.clear(); +//make sure the ttl col has expired +try +{ +Thread.sleep(1500); +} +catch (InterruptedException e) +{ +throw new AssertionError(e); +} SSTableReader reader = writer.closeAndOpenReader(); // Export to JSON and verify @@ -203,6 +213,11 @@ public class SSTableExportTest extends SchemaLoader assertTrue(cf != null); assertTrue(cf.getColumn(ByteBufferUtil.bytes("name")).value().equals(ByteBuffer.wrap(hexToBytes("76616c"; +qf = QueryFilter.getNamesFilter(Util.dk("rowA"), new QueryPath("Standard1", null, null), ByteBufferUtil.bytes("ttl")); +cf = qf.getSSTableColumnIterator(reader).getColumnFamily(); +assertTrue(cf != null); +assertTrue(cf.getColumn(ByteBufferUtil.bytes("ttl")) == null); +
Re: Reducing confusion around client libraries
I agree with the importance of the Thrift API. When I starting using Cassandra I found the idiomatic API's hid the true nature of what Cassandra does. It felt like trying to learn how a RDBMS works by learning how something like (java) hibernate or (ms) LINQ works. IMHO Cassandra *is* the thrift/avro API just like any RDBMS *is* the SQL language. Thanks AaronOn 07 Dec, 2010,at 07:15 AM, Hannes Schmidt wrote:Probably chiming in a little late here, but I liked having the Thrift API documentation in a prominent place. It is a canonical reference that describes on a logical level what the system can and can't do. Without that information it would have been much harder to understand how to use the Hector client. And without that information I wouldn't have been able to pinpoint bugs in libcassandra. Having a language- and platform-independent interface specification is worth gold in my opinion. Moving the clients under the umbrella of the project would increase the danger that the vetted client source becomes the de-facto reference because it would be temptingly easy to modify server and client in lock-step for changes of the on-the-wire format without bothering to document the change. I also like seeing the competition of ideas in the client world. I think it will take some time for the API to mature and settle and a wider variety of client architectures needs to be evaluated before a set of vetted clients should be chosen. On Sun, Dec 5, 2010 at 6:48 AM, Simon Reavely wrote: > Maybe there needs to be a "listing criteria" for a client library, that > includes things like examples for what is considered enough to get folks > started (connections, reads, writes, etc) in addition to what Ran > suggested "[maintainer, last release, next release, support > forum, number of committers, number of users, spring support, jpa support > etc]." I would also have a "who's using us" column as well. > > If the library maintainer does not satisfy the listing criteria they can't > get listed. Then we just need to decide what the criteria is ;-) > > Other than understanding how up to date and frequently maintained a library > is I think that (full) good examples are essential. > > Having said that, I am not actually against some hierarchical organization > in which there is some form of "tested/verified" client library list, then > "others". To keep things fair the question would then be how something gets > to be "tested/verified". In an opensource community I expect the library > developers could take some of this on themselves even if the > testing/verification is part of the main builds by way of some form of > plugin/test suite but my level of thinking on this is shallow. > > Just my 2 cents/pennies on this topic! > > Cheers, > Simon > > On Fri, Dec 3, 2010 at 4:07 PM, Ran Tavory wrote: > > > As developer of one of the client libraries I can say that competition > > keeps > > us the library maintainers healthy and in the long run creates more value > > to > > the users so we should keep competition fair. > > I can certainly see Jonathan's point regarding the level of confusion b/w > > newcomers and I'm all for reducing it, but only as long as there's a fair > > chance for all clients to evolve. > > To the points that the server can provide a better interface (avro or CQL > > and what have you), I think this can improve overall client development > but > > will not eliminate the need for clients, there will always be a higher > > level > > and nicer interface a client can provide or plugins to 3rd party (spring > > and > > such) so it does not solve the confusion problem, there will always be > more > > clients as long as cassandra keeps evolving. > > > > I like transparency and I think that if you present users enough data > they > > will be able to decide mind, even new comers. It would be correct to say > > that generally folks who'd been involved with cassandra for a few years > are > > better informed than newcomers however it is sometimes hard to make an > > objective decision and it's also hard to make a one-size-fits-all > decision, > > for example some clients implement feature x and not y and for most users > > it > > makes a lot of sense only that for some users they need y and not x. We > > need > > to be transparent and list the features and tradeoffs and let the users > > decide. > > I like Paul's idea of a table with a list of libraries and for each > library > > a set of columns such as [maintainer, last release, next release, support > > forum, number of committers, number of users, spring support, jpa support > > etc]. There's a challenge of keeping this table up to date but on the > other > > hand if a library maintainer does not keep his row up to date then it's a > > signal. If voting can be made easily then I'm all for it as well as part > of > > this table. I don't think the table would be huge, it's probably 2-3 per > > language. > > > > > > On Fri, Dec 3, 2010 at 10:25 PM, Paul Brown > wrote:
Re: Multi-tenancy, and authentication and authorization
Have a read about JVM heap sizing here http://wiki.apache.org/cassandra/MemtableThresholds If you let people create keyspaces with a mouse click you will soon run out of memory. I use Cassandra to provide a self service "storage service" at my organisation. All virtual databases operate in the same Cassandra keyspace (which does not change), and I use namespaces in the keys to separate things. Take a look at how amazon S3 works, it may give you some ideas. If you want to continue to discussion let's move this to the user list. A On 17/01/2011, at 7:44 PM, indika kumara wrote: > Hi Stu, > > In our app, we would like to offer cassandra 'as-is' to tenants. It that > case, each tenant should be able to create Keyspaces as needed. Based on the > authorization, I expect to implement it. In my view, the implementation > options are as follows. > > 1) The name of a keyspace would be 'the actual keyspace name' + 'tenant ID' > > 2) The name of a keyspace would not be changed, but the name of a column > family would be the 'the actual column family name' + 'tenant ID'. It is > needed to keep a separate mapping for keyspace vs tenants. > > 3) The name of a keypace or a column family would not be changed, but the > name of a column would be 'the actual column name' + 'tenant ID'. It is > needed to keep separate mappings for keyspace vs tenants and column family > vs tenants > > Could you please give your opinions on the above three options? if there > are any issue regarding above approaches and if those issues can be solved, > I would love to contribute on that. > > Thanks, > > Indika > > > On Fri, Jan 7, 2011 at 11:22 AM, Stu Hood wrote: > >>> (1) has the problem of multiple memtables (a large amount just isn't >> viable >> There are some very straightforward solutions to this particular problem: I >> wouldn't rule out running with a very large number of >> keyspace/columnfamilies given some minor changes. >> >> As Brandon said, some of the folks that were working on multi-tenancy for >> Cassandra are no longer focused on it. But the code that was generated >> during our efforts is very much available, and is unlikely to have gone >> stale. Would love to talk about this with you. >> >> Thanks, >> Stu >> >> On Thu, Jan 6, 2011 at 8:08 PM, indika kumara >> wrote: >> >>> Thank you very much Brandon! >>> >>> On Fri, Jan 7, 2011 at 12:40 AM, Brandon Williams >>> wrote: >>> On Thu, Jan 6, 2011 at 12:33 PM, indika kumara wrote: > Hi Brandon, > > I would like you feedback on my two ideas for implementing mufti >>> tenancy > with the existing implementation. Would those be possible to >>> implement? > > Thanks, > > Indika > >> Two vague ideas: (1) qualified keyspaces (by the tenet domain) >>> (2) > multiple Cassandra storage configurations in a single node (one per > tenant). > For both options, the resource hierarchy would be /cassandra/ > //keyspaces// > (1) has the problem of multiple memtables (a large amount just isn't >>> viable right now.) (2) more or less has the same problem, but in JVM >> instances. I would suggest a) not trying to offer cassandra itself, and instead >>> build a service that uses cassandra under the hood, and b) splitting up tenants >>> in this layer. -Brandon >>> >>
Looking for Cassandra work.
I've decided to leave Weta Digital so I can spend more time working on and with Cassandra. If you would like to hire me from mid March please contact me directly on aa...@thelastpickle.com I'm an Australian based in New Zealand and have skills in Python, Java, C#, Cassandra and other No Sql's , RDBMS, web and fat client development. Cheers Aaron
Re: [VOTE] 0.7.1 (3 times the charm?)
I just re-opened CASSANDRA-2081 https://issues.apache.org/jira/browse/CASSANDRA-2081 there was a bug in StorageProxy.scan() that may need to be included. I listed another possible Message problem in the ticket, may pay to get someone else to give the StorageProxy a good going over. Aaron On 5/02/2011, at 3:06 PM, Jeremy Hanna wrote: > Just wondering - how does the distributed test framework fit into votes? > Does it get run each time a vote happens to check for bugs/regressions? > > On Feb 4, 2011, at 1:40 PM, Eric Evans wrote: > >> >> Lather. Rinse. Repeat. Ya'll know the drill. >> >> SVN: >> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7@r1067260 >> 0.7.1 artifacts: http://people.apache.org/~eevans >> >> The vote will be open for 72 hours. >> >> >> [1]: http://goo.gl/axEK0 (CHANGES.txt) >> [2]: http://goo.gl/66yGY (NEWS.txt) >> >> -- >> Eric Evans >> eev...@rackspace.com >> >
Re: Using Cassandra-cli
There is also extensive online help in cassandra-clihelp;AaronOn 08 Feb, 2011,at 07:24 AM, Vishal Gupta wrote:Hi, there is a README.txt file in CASSANDRA_HOME which presents clear steps to use get and set command Also i guess you need to first use Keyspace and then fire set command. Regards, vishal On Mon, Feb 7, 2011 at 11:43 PM, Eranda Sooriyabandara <0704...@gmail.com>wrote: > Hi all, > I tried Cassandra cli option in my machine. Here are my cli commands and > the > outputs. > > >>./cassandra-cli -host localhost -port 9160 -username eranda -keyspace > keyspace1 -password eranda > Keyspace 'keyspace1' not found. > > >>./cassandra-cli -host localhost -port 9160 > Connected to: "Test Cluster" on localhost/9160 > Welcome to cassandra CLI. > > [default@unknown] set keyspace1.standard['emahesh']['first']='eranda'; > Syntax error at position 13: mismatched input '.' expecting '[' > > As the output say my commands did not work well. Here I used the commands > which is in http://wiki.apache.org/cassandra/CassandraCli. I couldn't find > the error of mine. Can anyone please help me to figure out the error. > > thanks > Eranda >
Re: How do secondary indices work
Moving to the user group.On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:Hello, I'd like some information about how secondary indices work under the hood. 1) Is data stored in some external data structure, or is it stored in an actual Cassandra table, as columns within column families? 2) Is data stored sorted or not? How is it partitioned? 3) How can I access index data? Thanks in a advance, Alexander Altanis
Re: Monitoring Cluster with JMX
Can't you get the length of the list on the monitoring side of things ?aaronOn 08 Feb, 2011,at 10:25 PM, Roland Gude wrote:Hello, we are trying to monitor our cassandra cluster with Nagios JMX checks. While there are JMX attributes which expose the list of reachable/unreachable hosts, it would be very helpful to have additional numeric attributes exposing the size of these lists. This could be used to set thresholds (in Nagios monitoring) i.e. at least 3 hosts must be reachable before Nagios issues a warning. This is probably not hard to do and we are willing to implement/supply patches if someone could point us in the right direction on where to implement it. Greetings, roland -- YOOCHOOSE GmbH Roland Gude Software Engineer Im Mediapark 8, 50670 Köln +49 221 4544151 (Tel) +49 221 4544159 (Fax) +49 171 7894057 (Mobil) Email: roland.g...@yoochoose.com WWW: www.yoochoose.com> YOOCHOOSE GmbH Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann Handelsregister: Amtsgericht Köln HRB 65275 Ust-Ident-Nr: DE 264 773 520 Sitz der Gesellschaft: Köln
Gossip messages at DEBUG
I've just put the latest 0.7 build on a node and it's logging gossip messages at DEBUG and making the logs really hard to use. Anyone object to moving these to TRACE level ? e.g.here's 6 in a second for a machine doing nothing. DEBUG [GossipStage:1] 2011-02-09 15:56:04,259 MessagingService.java (line org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:302)) jb04/192.168.114.63 sending GOSSIP_DIGEST_ACK to 9815@/192.168.114.67DEBUG [ScheduledTasks:1] 2011-02-09 15:56:04,424 MessagingService.java (line org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:302)) jb04/192.168.114.63 sending GOSSIP_DIGEST_SYN to 9816@/192.168.114.67DEBUG [ScheduledTasks:1] 2011-02-09 15:56:04,424 MessagingService.java (line org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:302)) jb04/192.168.114.63 sending GOSSIP_DIGEST_SYN to 9817@/192.168.114.65DEBUG [GossipStage:1] 2011-02-09 15:56:04,424 MessagingService.java (line org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:302)) jb04/192.168.114.63 sending GOSSIP_DIGEST_ACK2 to 9818@/192.168.114.67DEBUG [GossipStage:1] 2011-02-09 15:56:04,424 MessagingService.java (line org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:302)) jb04/192.168.114.63 sending GOSSIP_DIGEST_ACK2 to 9819@/192.168.114.65DEBUG [GossipStage:1] 2011-02-09 15:56:04,483 MessagingService.java (line org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:302)) jb04/192.168.114.63 sending GOSSIP_DIGEST_ACK to 9820@/192.168.114.66Aaron
Re: Gossip messages at DEBUG
thanks.AOn 10 Feb, 2011,at 08:21 AM, Brandon Williams wrote:On Tue, Feb 8, 2011 at 9:01 PM, Aaron Morton <aa...@thelastpickle.com>wrote: > I've just put the latest 0.7 build on a node and it's logging gossip > messages at DEBUG and making the logs really hard to use. Anyone object to > moving these to TRACE level ? > Moved to TRACE. I think when this was moved from sendRR to sendOneWay gossip wasn't considered. -Brandon
Re: RE: SEVERE Data Corruption Problems
Looks like the bloom filter for the row is corrupted, does it happen for all reads or just for reads on one row ? After the upgrade to 0.7 (assuming an 0.7 nightly build) did you run anything like nodetool repair ? Have you tried asking on the #cassandra IRC room to see if their are any comitters around ? AaronOn 11 Feb, 2011,at 01:18 PM, Dan Hendry wrote:Upgraded one node to 0.7. Its logging exceptions like mad (thousands per minute). All like below (which is fairly new to me): ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Threa d[ReadStage:721,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa mesIterator.java:75) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam esQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil ter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto re.java:1275) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1167) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1095) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma nd.java:60) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor ageProxy.java:473) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri alizer.java:48) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri alizer.java:30) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper. java:108) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName sIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa mesIterator.java:71) ... 12 more Dan -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: February-09-11 18:14 To: dev Subject: Re: SEVERE Data Corruption Problems Hi Dan, it would be very useful to test with 0.7 branch instead of 0.7.0 so at least you're not chasing known and fixed bugs like CASSANDRA-1992. As you say, there's a lot of people who aren't seeing this, so it would also be useful if you can provide some kind of test harness where you can say "point this at a cluster and within a few hours On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendrywrote: > I have been having SEVERE data corruption issues with SSTables in my > cluster, for one CF it was happening almost daily (I have since shut down > the service using that CF as it was too much work to manage the Cassandra > errors). At this point, I cant see how it is anything but a Cassandra bug > yet its somewhat strange and very scary that I am the only one who seems to > be having such serious issues. Most of my data is indexed in two ways so I > have been able to write a validator which goes through and back fills > missing data but its kind of defeating the whole point of Cassandra. The > only way I have found to deal with issues when they crop up to prevent nodes > crashing from repeated failed compactions is delete the SSTable. My cluster > is running a slightly modified 0.7.0 version which logs what files errors > for so that I can stop the node and delete them. > > > > The problem: > > - Reads, compactions and hinted handoff fail with various > exceptions (samples shown at the end of this email) which seem to indicate > sstable corruption. > > - I have seen failed reads/compactions/hinted handoff on 4 out of 4 > nodes (RF=2) for 3 different super column families and 1 standard column > family (4 out of 11) and just now, the Hints system CF. (if it matters the > ring has not changed since one CF which has been giving me trouble was > created). I have check SMART disk info and run various diagnostics and there > does not seem to be any hardware issues, plus what are the chances of all > four nodes having the same hardware problems at the same time when for all > other purposes, they appear fine? > > - I have added logging which outputs what sstable are causing > exceptions to be thrown. The corrupt sstables have been both freshly flushed > memtables and the output of compaction (ie, 4 sstables which all seem to be > fine get comp
6.12 release?
A guy on the user list has asked about getting a 6.12 release out that includes the fix for CASSANDRA-2081. Without it get_range_slice where CL > ONE will timeout as the message id's are reused. Jonathan has back ported the relevant parts of the second patch (which concerned get_indexed_slices) from the ticket. Can we get this one released?Aaron
rewriting cli help
I'm working on moving the cli online help into a yaml file for ease of maintenance and am now trying to merge the existing cli help with whats in cassandra.yaml and the wiki. If you have any desires for how it should look please comment on the https://issues.apache.org/jira/browse/CASSANDRA-2008ThanksAaron
Re: Data model
Will answer on the user list. Aaron On 8/03/2011, at 1:11 AM, Baskar wrote: > Does Cassandra allow nesting of column families? > > Here is the use case > - we need to store calls made by employees > - employees are associated with an account > - accounts have phone numbers > - many calls are made by employees for a given account and phone > > If possible, would like to store call related data against employee. > > Thanks > Baskar
Fwd: batch inserts in cassandra 0.7
batch_insert was depricated in 0.6, you should have been using batch_mutate http://wiki.apache.org/cassandra/API Aaron Begin forwarded message: > From: Anurag Gujral > Date: 16 March 2011 10:04:56 GMT+13:00 > To: dev@cassandra.apache.org > Subject: batch inserts in cassandra 0.7 > Reply-To: dev@cassandra.apache.org > > Hi All, > I am moving from cassandra 0.6 to 0.7 I was using function > send_batch_inserts to do batch inserts in cassandra 0.6 when I moved to 0.7 > I dont see the function send_batch_insert > is there a way to do batch inserts in cassandra 0.7 using thrift-0.0.5. > > Thanks > Anurag
Re: Limitations on number of secondary indexes
Moving to user. Aaron On 20 Apr 2011, at 10:45, Jason Kolb wrote: > I apologize if this has been answered before, I've tried to do some pretty > exhaustive searching of the archives and haven't been able to see if this > question has been answered before. > > I was wondering if anyone knows if there is a practical upper limit on the > number of secondary indexes used, if they're sparsely populated (say, 10,000 > secondary indexes only 2 of which are populated per row). My understanding > is that Cassandra creates another column family for each secondary index in > the background, so the real limitation would appear to be the number of > column families. > > Is this correct? And if so (or even if not), does anyone know the answer to > the question about the upper limit on the number of secondary indexes? > > Thanks! > Jason
Re: Compacting single file forever
Moving to the user list. Aaron On 20 Apr 2011, at 21:25, Shotaro Kamio wrote: > Hi, > > I found that our cluster repeats compacting a single file forever > (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd > like to have comments from you guys. > > Situation: > - After trying to repair a column family, our cluster's disk usage is > quite high. Cassandra cannot compact all sstables at once. I think it > repeats compacting single file at the end. (you can check the attached > log below) > - Our data doesn't have deletes. So, the compaction of single file > doesn't make free disk space. > > We are approaching to full-disk. But I believe that the repair > operation made a lot of duplicate data on the disk and it requires > compaction. However, most of nodes stuck on compacting a single file. > The only thing we can do is to restart the nodes. > > My question is why the compaction doesn't stop. > > I looked at the logic in CompactionManager.java: > - >String compactionFileLocation = > table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables)); >// If the compaction file path is null that means we have no > space left for this compaction. >// try again w/o the largest one. >List smallerSSTables = new > ArrayList(sstables); >while (compactionFileLocation == null && smallerSSTables.size() > 1) >{ >logger.warn("insufficient space to compact all requested > files " + StringUtils.join(smallerSSTables, ", ")); >smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables)); >compactionFileLocation = > table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables)); >} >if (compactionFileLocation == null) >{ >logger.error("insufficient space to compact even the two > smallest files, aborting"); >return 0; >} > - > > The while condition: smallerSSTables.size() > 1 > Is this should be "smallerSSTables.size() > 2" ? > > In my understanding, compaction of single file makes free disk space > only when the sstable has a lot of tombstone and only if the tombstone > is removed in the compaction. If cassandra knows the sstable has > tombstones to be removed, it's worth to compact it. Otherwise, it > might makes free space if you are lucky. In worst case, it leads to > infinite loop like our case. > > What do you think the code change? > > > Best regards, > Shotaro > > > * Cassandra compaction log > - > WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader( > path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3035-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 9,855,385ms. > > WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path='foobar-f-3020-Data.db'), > SSTableReader(path='foobar-f-3035-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3036-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 9,809,882ms. > > WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path='foobar-f-3020-Data.db'), > SSTableReader(path='foobar-f-3036-Data.db') > INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903 > CompactionManager.java (line 482) Compacted to > foobar-tmp-f-3037-Data.db. 260,646,760,319 to 260,646,760,319 (~100% > of original) bytes for 6,893,896 keys. Time: 10,087,424ms. > - > You can see that compacted size is always the same. It repeats > compacting the same single sstable.
Fwd: Error trying to move a node - 0.7
Will answer on the user list. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com Begin forwarded message: > From: Ben Frank > Date: 17 June 2011 07:42:07 GMT+12:00 > To: dev@cassandra.apache.org > Subject: Error trying to move a node - 0.7 > Reply-To: dev@cassandra.apache.org > > Hi All, > I'm getting the following error when trying to move a nodes token: > > nodetool -h 145.6.92.82 -p 18080 move 56713727820156410577229101238628035242 > cassandra.in.sh executing for environment DEV1 > Exception in thread "main" java.lang.AssertionError >at > org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:393) >at > org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:418) >at > org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:94) >at > org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:807) >at > org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:773) >at > org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1468) >at > org.apache.cassandra.service.StorageService.move(StorageService.java:1605) >at > org.apache.cassandra.service.StorageService.move(StorageService.java:1580) > . > . > . > > my ring looks like this: > > Address Status State LoadOwnsToken > > 113427455640312821154458202477256070484 > 145.6.99.80 Up Normal 1.63 GB 36.05% > 4629135223504085509237477504287125589 > 145.6.92.82 Up Normal 2.86 GB 1.09% > 6479163079760931522618457053473150444 > 145.6.99.81 Up Normal 2.01 GB 62.86% > 113427455640312821154458202477256070484 > > > '80' and '81' are configured to be in the East coast data center and '82' is > in the West > > Anyone shed any light as to what might be going on here? > > -Ben
Re: Reoganizing drivers
I can see the drivers have moved to http://svn.apache.org/repos/asf/cassandra/drivers/ Just wondering where that path is available on git://git.apache.org/cassandra.git These are the remote branches I can find $ git ls-remote | grep drivers From git://git.apache.org/cassandra.git 20635cec24389d83b146af51fa902fcf2d21491brefs/remotes/tags/drivers dd06878fa6b143dbff1e1e338087041b1b230d48refs/tags/drivers 20635cec24389d83b146af51fa902fcf2d21491brefs/tags/drivers^{} Thanks A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 Jun 2011, at 05:01, Eric Evans wrote: > On Tue, 2011-06-07 at 18:40 +0200, Sylvain Lebresne wrote: >> On Tue, Jun 7, 2011 at 3:18 PM, Jonathan Ellis >> wrote: >>> Sounds fine as far as it goes, but don't we want some concept of >>> branches/tags for driver releases too? >> >> Our idea so far (Eric can correct me if I'm wrong :)) was to consider >> the drivers directory as the 'trunk' for drivers, and create branches >> and tags for them alongside the cassandra ones. > > Yup. In fact, I already tagged the Python and Java drivers as > tags/drivers// during the last release (neither of those > driver artifacts corresponded to the same SVN rev, nor did they > correspond to the rev for 0.8.0). >> >> Truth is, I even think that consider the drivers as a whole is not >> granular enough. It's unlikely the different drivers will move at the >> same pace. > > As far as I know, there is no reason that a tag (say > tags/drivers/py/1.1.1) can't point to a subdirectory of drivers/ (i.e. > drivers/py). In fact, that's how the tags mentioned above were done > (except those pointed to branches/cassandra-0.8.0/drivers/). I > think it just boils down to a matter convention. >> >> *But*, we believe that moving the drivers up one level is at least a >> first step towards something better than the status quo. > > Yeah, even if we decide to do something different later on, this is an > improvement over what we have now. > > -- > Eric Evans > eev...@rackspace.com >
Re: Reoganizing drivers
Asked on #asfinfra and was told the only things mirrored on git are trunk / tags / branches . git-svn it is. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20 Jun 2011, at 16:28, Jonathan Ellis wrote: > Maybe the non-standard path is giving the git mirror fits. > > On Sun, Jun 19, 2011 at 11:26 PM, aaron morton > wrote: >> I can see the drivers have moved to >> http://svn.apache.org/repos/asf/cassandra/drivers/ >> >> Just wondering where that path is available on >> git://git.apache.org/cassandra.git >> >> These are the remote branches I can find >> >> $ git ls-remote | grep drivers >> From git://git.apache.org/cassandra.git >> 20635cec24389d83b146af51fa902fcf2d21491brefs/remotes/tags/drivers >> dd06878fa6b143dbff1e1e338087041b1b230d48refs/tags/drivers >> 20635cec24389d83b146af51fa902fcf2d21491brefs/tags/drivers^{} >> >> Thanks >> A >> >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 8 Jun 2011, at 05:01, Eric Evans wrote: >> >>> On Tue, 2011-06-07 at 18:40 +0200, Sylvain Lebresne wrote: >>>> On Tue, Jun 7, 2011 at 3:18 PM, Jonathan Ellis >>>> wrote: >>>>> Sounds fine as far as it goes, but don't we want some concept of >>>>> branches/tags for driver releases too? >>>> >>>> Our idea so far (Eric can correct me if I'm wrong :)) was to consider >>>> the drivers directory as the 'trunk' for drivers, and create branches >>>> and tags for them alongside the cassandra ones. >>> >>> Yup. In fact, I already tagged the Python and Java drivers as >>> tags/drivers// during the last release (neither of those >>> driver artifacts corresponded to the same SVN rev, nor did they >>> correspond to the rev for 0.8.0). >>>> >>>> Truth is, I even think that consider the drivers as a whole is not >>>> granular enough. It's unlikely the different drivers will move at the >>>> same pace. >>> >>> As far as I know, there is no reason that a tag (say >>> tags/drivers/py/1.1.1) can't point to a subdirectory of drivers/ (i.e. >>> drivers/py). In fact, that's how the tags mentioned above were done >>> (except those pointed to branches/cassandra-0.8.0/drivers/). I >>> think it just boils down to a matter convention. >>>> >>>> *But*, we believe that moving the drivers up one level is at least a >>>> first step towards something better than the status quo. >>> >>> Yeah, even if we decide to do something different later on, this is an >>> improvement over what we have now. >>> >>> -- >>> Eric Evans >>> eev...@rackspace.com >>> >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com
CASSANDRA-2249 not in CHANGES.txt for 1.0
It's in NEWS should it also be in CHANGES? https://issues.apache.org/jira/browse/CASSANDRA-2449 Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com
wiki updates
With 1.0 almost here I was going to try and spruce up the wiki a bit to make things a little more welcoming for new users. I've created a copy of the home page here http://wiki.apache.org/cassandra/FrontPage_draft_aaron as a working draft. I've re-aranged things a little, and added some links to pages that do not yet exist. I was going to use it as a planning tool by working through all the pages linked there to see if they needed updated examples, or were yet to be written, that sort of thing. Thoughts ? I'll probably ask for some volunteers on the user list. Cheers --------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com
Re: wiki updates
Ok, I'll go ahead and work out a way for people to contribute. @Nick, yes to include CQL examples, and to give them first, but I would also keep the rpc call for now. As the RPC interface is still officially supported. @Yang, Happy to re-arrange things once we have some content. For better or worse I used the Hive wiki as guide https://cwiki.apache.org/confluence/display/Hive/Home . Creating new content takes time, first I'd like to improve what we have and make sure it is correct. Thanks ----- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 7/10/2011, at 1:24 AM, Yi Yang wrote: > Thanks Aaron for the hard work. The new front page gives a much clear > image on Cassandra. > > However I would also like to present some of my thoughts - based on my own > learning path: > > 1) The first part should present a clear image on what Cassandra is, and > what's inside Cassandra - thus we'd better include the Data Model section in > it - people will easily get to know what's the difference between Cassandra > and other key based databases like HBase, MangoDB etc., therefore make wise > choices. > > 2) The second part could give a glance at how to make Cassandra running on a > small to moderate level application - aka the a small development platform. > Including how to create a cluster and also running on Windows / Amazon > EC2. Here StorageConfiguration is not so important because the target of > this section is to help users build up a usable Cassandra Cluster. > > 3) The third part can help the administrators of large scale application, > and also introducing management strategies like storage configuration and > also other monitoring, node operations techniques. > > The rest parts are perfect. I hope we can match the list with a typical > users' learning experience, so that for users at different level they can > focus on their own section. It's just my own idea - and it might different > from others' experiences, hopefully it could help. Thanks again for the > great work. > > Best, > Yi > > On Thu, Oct 6, 2011 at 7:29 PM, aaron morton wrote: > >> With 1.0 almost here I was going to try and spruce up the wiki a bit to >> make things a little more welcoming for new users. >> >> I've created a copy of the home page here >> http://wiki.apache.org/cassandra/FrontPage_draft_aaron as a working draft. >> >> I've re-aranged things a little, and added some links to pages that do not >> yet exist. I was going to use it as a planning tool by working through all >> the pages linked there to see if they needed updated examples, or were yet >> to be written, that sort of thing. >> >> Thoughts ? >> >> I'll probably ask for some volunteers on the user list. >> >> Cheers >> >> - >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >>
Re: cassandra node is not starting
What version of cassandra and what OS ? It sort of looks like it tried to delete a secondary in CF that was defined in the system KS. Turn the logging up to DEBUG and see what happens. Hope that helps. Aaron - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/01/2012, at 11:20 PM, Michael Vaknine wrote: > Hi, > > > > During restart cassandra not is failing to start > > The error is > > ERROR [main] 2012-01-01 05:03:42,903 AbstractCassandraDaemon.java (line 354) > Exception encountered during startup > > java.lang.AssertionError: attempted to delete non-existing file > AttractionUserIdx.AttractionUserIdx_09partition_idx-h-1-Data.db > >at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49) > >at > org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44) > >at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:133) > >at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyS > tore.java:355) > >at > org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyS > tore.java:402) > >at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandra > Daemon.java:174) > >at > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassan > draDaemon.java:337) > >at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107) > > > > can someone tell me how to recover from that > > thanks > > > > Michael >
Re: Welcome committer Aaron Morton!
Thanks Jonathan and the other committers. Cheers :) - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/01/2012, at 7:19 AM, Jonathan Ellis wrote: > The Apache Cassandra PMC has voted to add Aaron as a committer. > Thanks for helping make Cassandra what it is today! > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com
Re: How to create a table in Cassandra
This question belongs on the user list, I will answer it there. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/01/2012, at 3:36 AM, anandbab...@polarisft.com wrote: > > Can anyone tell me how to create a table in the Cassandra. I have > installed it... and I am new to this... > Thanks, > Barnabas > > > > This e-Mail may contain proprietary and confidential information and is sent > for the intended recipient(s) only. If by an addressing or transmission > error this mail has been misdirected to you, you are requested to delete this > mail immediately. You are also hereby notified that any use, any form of > reproduction, dissemination, copying, disclosure, modification, distribution > and/or publication of this e-mail message, contents or its attachment other > than by its intended recipient/s is strictly prohibited. > > Visit us at http://www.polarisFT.com >
Re: Thift vs. CQL
This question belongs on the user list, I will answer it there. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/01/2012, at 1:26 AM, bxqdev wrote: > Hello! > > Datastax's Cassandra documentation says that CQL API is the future of > Cassandra API. It's also says that eventually Thift API will be removed > completely. Is it true? Do you have any plans of removing Thift API, leaving > CQL API only?? > > thanks.
Re: extra diffs showing up in update column family
Can you raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA with steps to reproduce. Thanks p.s. the user list is the appropriate list for emails like this. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/01/2012, at 9:31 AM, Dave Brosius wrote: > If a user specifies a Comparator in an update column family (as was from a > irc user), as > > update column family report_by_account_content with comparator=UTF8Type and > column_metadata = [{ column_name:'meta:account-id', > validation_class:UTF8Type,index_type:KEYS},{ column_name:'meta:filter-hash', > validation_class:UTF8Type,index_type:KEYS}]; > > > The comparator value is seen as different because the original comparator was > the fully qualified name of the org.apache.cassandra.db.marshal.UTF8Type, and > new one is what is passed in UTF8Type. So CFMetaData.diff sees this as a > change and does extra work because of it. > > I'm guessing there are other class name values where this holds true as well. > > Is this a big enough concern to address? > > thanks > dave > >
Re: understanding cassandra internal
The code is where it's at, and... http://www.datastax.com/2011/08/video-cassandra-internals-presentation-from-cassandra-sf-2011 http://wiki.apache.org/cassandra http://planetcassandra.org/ http://www.datastax.com/docs/1.0/index Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/01/2012, at 12:07 PM, Thanh Do wrote: > hi all, > > I would like to study the internal code > of cassandra. The website (wiki) > provides limited documentation. > > Is there any way (documents, blogs) > that mention in details about how > cassandra internally works? > Is there a fast way beside > walking through the code > and reason about how it works. > > many thanks, > Thanh
Re: extra diffs showing up in update column family
Sorry, i thought bug reports went to the user list. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/01/2012, at 1:39 PM, Brandon Williams wrote: > On Mon, Jan 30, 2012 at 6:36 PM, aaron morton wrote: >> p.s. the user list is the appropriate list for emails like this. > > I disagree, this was on-topic for dev@ imho. > > -Brandon
Re: Queries on AuthN and AuthZ for multi tenant Cassandra
The existing authentication plug-in does not support row level authorization. You will need to add authentication to your API layer to ensure that a request from client X always has the client X key prefix. Or modify cassandra to provide row level authentication. The 1.x Memtable memory management is awesome, but I would still be hesitant about creating KS's and CF's at the request of an API client. Cheers ----- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/02/2012, at 8:52 AM, Subrahmanya Harve wrote: > We are using Cassandra 0.8.7 and building a multi-tenant cassandra platform > where we have a common KS and common CFs for all tenants. By using Hector's > virtual keyspaces, we are able to add modify rowkeys to have a tenant > specific id. (Note that we do not allow tenants to modify/create KS/CF. We > just allow tenants to write and read data) However we are in the process of > adding authentication and authorization on top of this platform such that > no tenant should be able to retrieve data belonging to any other tenant. > > By configuring Cassandra for security using the documentation here - > http://www.datastax.com/docs/0.8/configuration/authentication , we were > able to apply the security constraints on the common keyspace and common > CFs. However this does not prevent a tenant from retrieving data belonging > to another tenant. For this to happen, we would need to have separate CFs > and/or keyspaces for each tenant. > Looking for more information on the topic here > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Multi-tenancy-and-authentication-and-authorization-td5935230.htmland > other places, it looks like the recommendation is "not" to create > separate CFs and KSs for every tenant as this would have impacts on > Memtables and other memory issues. Does this recommendation still hold > good? > With jiras like > https://issues.apache.org/jira/browse/CASSANDRA-2006resolved, does it > mean we can now create multiple (but limited) CFs and KSs? > More generally, how do we prevent a tenant from intentional/accidental data > manipulation of data owned by another tenant? (given that all tenants will > provide the right credentials)
Re: [VOTE] Release Apache Cassandra 0.8.10
+1 - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/02/2012, at 5:19 AM, Sylvain Lebresne wrote: > It's been close to 2 months since 0.8.9 and while things are mostly calm on > the 0.8 branch, we do have a few fixes in there that is worth releasing. > I thus propose the following artifacts for release as 0.8.10. > > Git sha1: 038b8f212eb37c98ff4f230b722bc9a76daf1658 > Git: > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/0.8.10-tentative > Artifacts: > https://repository.apache.org/content/repositories/orgapachecassandra-209/org/apache/cassandra/apache-cassandra/0.8.10/ > Staging repository: > https://repository.apache.org/content/repositories/orgapachecassandra-209/ > > The artifacts as well as the debian package are also available here: > http://people.apache.org/~slebresne/ > > The vote will be open for 72 hours (longer if needed). > > [1]: http://goo.gl/ZOnuf (CHANGES.txt) > [2]: http://goo.gl/EXtfL (NEWS.txt)
Re: Welcome committer Peter Schuller
Congratulations. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/02/2012, at 8:08 AM, Peter Schuller wrote: >> The Apache Cassandra PMC has voted to add Peter as a committer. Thank >> you Peter, and we look forward to continuing to work with you! > > Thank *you*, as do I :) > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: nosetests
Looks like it's hanging while talking to the cluster. Ensure cassandra is running and on default ports. I also run nosetests with -vdx for verbose, detailed errors and stop of first fail (http://readthedocs.org/docs/nose/en/latest/usage.html#extended-usage) Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 15/04/2012, at 6:35 AM, Mark Dewey wrote: > PS I got the following trace when I aborted. > > Traceback (most recent call last): > File "/usr/bin/nosetests", line 9, in >load_entry_point('nose==0.11.4', 'console_scripts', 'nosetests')() > File "/usr/lib/pymodules/python2.7/nose/core.py", line 117, in __init__ >**extra_args) > File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__ >self.runTests() > File "/usr/lib/pymodules/python2.7/nose/core.py", line 196, in runTests >result = self.testRunner.run(self.test) > File "/usr/lib/pymodules/python2.7/nose/core.py", line 61, in run >test(result) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 176, in __call__ >return self.run(*arg, **kw) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 223, in run >test(orig) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 176, in __call__ >return self.run(*arg, **kw) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 223, in run >test(orig) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 176, in __call__ >return self.run(*arg, **kw) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 223, in run >test(orig) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 176, in __call__ >return self.run(*arg, **kw) > File "/usr/lib/pymodules/python2.7/nose/suite.py", line 223, in run >test(orig) > File "/usr/lib/pymodules/python2.7/nose/case.py", line 44, in __call__ >return self.run(*arg, **kwarg) > File "/usr/lib/pymodules/python2.7/nose/case.py", line 132, in run >self.runTest(result) > File "/usr/lib/pymodules/python2.7/nose/case.py", line 150, in runTest >test(result) > File "/usr/lib/python2.7/unittest/case.py", line 385, in __call__ >return self.run(*args, **kwds) > File "/usr/lib/python2.7/unittest/case.py", line 312, in run >self.setUp() > File "/usr/lib/pymodules/python2.7/nose/case.py", line 367, in setUp >try_run(self.inst, ('setup', 'setUp')) > File "/usr/lib/pymodules/python2.7/nose/util.py", line 491, in try_run >return func() > File "/home/mildewey/Projects/cassandra/test/system/__init__.py", line > 113, in setUp >self.define_schema() > File "/home/mildewey/Projects/cassandra/test/system/__init__.py", line > 180, in define_schema >self.client.system_add_keyspace(ks) > File > "/home/mildewey/Projects/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 1440, in system_add_keyspace >return self.recv_system_add_keyspace() > File > "/home/mildewey/Projects/cassandra/interface/thrift/gen-py/cassandra/Cassandra.py", > line 1451, in recv_system_add_keyspace >(fname, mtype, rseqid) = self._iprot.readMessageBegin() > File > "/usr/local/lib/python2.7/dist-packages/thrift/protocol/TBinaryProtocol.py", > line 137, in readMessageBegin >name = self.trans.readAll(sz) > File > "/usr/local/lib/python2.7/dist-packages/thrift/transport/TTransport.py", > line 58, in readAll >chunk = self.read(sz-have) > File > "/usr/local/lib/python2.7/dist-packages/thrift/transport/TTransport.py", > line 272, in read >self.readFrame() > File > "/usr/local/lib/python2.7/dist-packages/thrift/transport/TTransport.py", > line 276, in readFrame >buff = self.__trans.readAll(4) > File > "/usr/local/lib/python2.7/dist-packages/thrift/transport/TTransport.py", > line 58, in readAll >chunk = self.read(sz-have) > File > "/usr/local/lib/python2.7/dist-packages/thrift/transport/TSocket.py", line > 94, in read >buff = self.handle.recv(sz) > > > On Sat, Apr 14, 2012 at 1:34 PM, Mark Dewey wrote: > >> I thought I followed the instructions to set up the nose tests, but when I >> run them all they do is (slowly) print out ".E" and then hang. Any clues? >> >> Mark >>
Re: Server Side Logic/Script - Triggers / StoreProc
Out of interest some questions… When writing through triggers how do you handle the CL guarantee ? Is the CL level checked once at the start or checked for each embedded code invocation ? Do you still guarantee the (non counter) writes as idempotent ? i.e. do the triggers need to be deterministic ? Can clients retry operations that timed out ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/04/2012, at 5:13 AM, Colin Clark wrote: > In my opinion, triggers/stored procedures are an absolute requirement for any > distributed database. > > We've been using stored procedures in Cassandra now for a while, we've made > modifications such that we don't really write directly anymore but pass > everything through either a default stored procedures (which is just what was > there before) or a dynamically loaded piece of java. > > These stored procedures can call other dynamically loaded pieces of java as > well - we don't have any plans to implement any scripting capabilities. We > can also 'select' from procedures. > > The idea of downloading data from a distributed data base for processing > flies in the face of what nosql and bigdata is all about - you've got to do > it in the db. > > On Apr 22, 2012, at 11:35 AM, Brian O'Neill wrote: > >> Praveen, >> >> We are certainly interested. To get things moving we implemented an add-on >> for Cassandra to demonstrate the viability (using AOP): >> https://github.com/hmsonline/cassandra-triggers >> >> Right now the implementation executes triggers asynchronously, allowing you >> to implement a java interface and plugin your own java class that will get >> called for every insert. >> >> Per the discussion on 1311, we intend to extend our proof of concept to be >> able to invoke scripts as well. (minimally we'll enable javascript, but >> we'll probably allow for ruby and groovy as well) >> >> -brian >> >> On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote: >> >>> I found that Triggers are coming in Cassandra 1.2 >>> (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of >>> any StoreProc like pattern. >>> >>> I know this has been discussed so many times but never met with any >>> initiative. Even Groovy was staged out of the trunk. >>> >>> Cassandra is great for logging and as such will be infinitely more useful >>> if some logic can be pushed into the Cassandra cluster nearer to the >>> location of Data to generate a materialized view useful for applications. >>> >>> Server Side Scripts/Routines in Distributed Databases could soon prove to >>> be the differentiating factor. >>> >>> Let me reiterate things with a use case. >>> >>> In our application we store time series data in wide rows with TTL set on >>> each point to prevent data from growing beyond acceptable limits. Still the >>> data size can be a limiting factor to move all of it from the cluster node >>> to the querying node and then to the application via thrift for processing >>> and presentation. >>> >>> Ideally we should process the data on the residing node and pass only the >>> materialized view of the data upstream. This should be trivial if Cassandra >>> implements some sort of server side scripting and CQL semantics to call it. >>> >>> Is anybody else interested in a similar feature? Is it being worked on? Are >>> there any alternative strategies to this problem? >>> >>> Praveen >>> >>> >> >> -- >> Brian ONeill >> Lead Architect, Health Market Science (http://healthmarketscience.com) >> mobile:215.588.6024 >> blog: http://weblogs.java.net/blog/boneill42/ >> blog: http://brianoneill.blogspot.com/ >> >