sstable2json
Hi, Is there a tool similar to sstable2json that can be used to convert data in commitlog to json? Or does sstable2json let us read the commitlog as well? Regards, smh.
Re: sstable2json
Thanks Edward. Thats a good idea. Regards, smh. On Sat, Apr 23, 2011 at 11:04 AM, Edward Capriolo wrote: > On Fri, Apr 22, 2011 at 9:07 PM, Jonathan Ellis wrote: > > No. > > > > On Fri, Apr 22, 2011 at 3:22 PM, Subrahmanya Harve > > wrote: > >> Hi, > >> > >> Is there a tool similar to sstable2json that can be used to convert data > in > >> commitlog to json? Or does sstable2json let us read the commitlog as > well? > >> > >> Regards, > >> smh. > >> > > > > > > > > -- > > Jonathan Ellis > > Project Chair, Apache Cassandra > > co-founder of DataStax, the source for professional Cassandra support > > http://www.datastax.com > > > > You could > 1)configure a second instance of cassandra > 2)copy the commit logs to it > 3) start the instance. (this will write the commit logs to memtables) > 4) flush the column families > 5) sstable2json the results >
Data retrieval inconsistent
I am facing an issue in 0.8.7 cluster - - I have two clusters in two DCs (rather one cross dc cluster) and two keyspaces. But i have only configured one keyspace to replicate data to the other DC and the other keyspace to not replicate over to the other DC. Basically this is the way i ran the keyspace creation - create keyspace K1 with placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; create keyspace K2 with placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{DC1:2, DC2:2}]; I had to do this because i expect that K1 will get a large volume of data and i do not want this wired over to the other DC. I am writing the data at CL=ONE and reading the data at CL=ONE. I am seeing an issue where sometimes i get the data and other times i do not see the data. Does anyone know what could be going on here? A second larger question is - i am migrating from 0.7.4 to 0.8.7 , i can see that there are large changes in the yaml file, but a specific question i had was - how do i configure disk_access_mode like it used to be in 0.7.4? One observation i have made is that some nodes of the cross dc cluster are at different system times. This is something to fix but could this be why data is sometimes retrieved and other times not? Or is there some other thing to it? Would appreciate a quick response.
Re: Data retrieval inconsistent
Thanks Ed and Jeremiah for that useful info. "I am pretty sure the way you have K1 configured it will be placed across both DC's as if you had large ring. If you want it only in DC1 you need to say DC1:1, DC2:0." Infact i do want K1 to be available across both DCs as if i had a large ring. I just do not want them to replicate over across DCs. Also i did try doing it like you said DC1:1, DC2:0 but wont that mean that, all my data goes into DC1 irrespective of whether the data is getting into the nodes of DC1 or DC2, thereby creating a "hot DC"? Since the volume of data for this case is huge, that might create a load imbalance on DC1? (Am i missing something?) On Thu, Nov 10, 2011 at 1:30 PM, Jeremiah Jordan < jeremiah.jor...@morningstar.com> wrote: > I am pretty sure the way you have K1 configured it will be placed across > both DC's as if you had large ring. If you want it only in DC1 you need to > say DC1:1, DC2:0. > If you are writing and reading at ONE you are not guaranteed to get the > data if RF > 1. If RF = 2, and you write with ONE, you data could be > written to server 1, and then read from server 2 before it gets over there. > > The differing on server times will only really matter for TTL's. Most > everything else works off comparing user supplied times. > > -Jeremiah > > > On 11/10/2011 02:27 PM, Subrahmanya Harve wrote: > >> >> I am facing an issue in 0.8.7 cluster - >> >> - I have two clusters in two DCs (rather one cross dc cluster) and two >> keyspaces. But i have only configured one keyspace to replicate data to the >> other DC and the other keyspace to not replicate over to the other DC. >> Basically this is the way i ran the keyspace creation - >>create keyspace K1 with placement_strategy='org.** >> apache.cassandra.locator.**SimpleStrategy' and strategy_options = >> [{replication_factor:1}]; >>create keyspace K2 with placement_strategy='org.** >> apache.cassandra.locator.**NetworkTopologyStrategy' and strategy_options >> = [{DC1:2, DC2:2}]; >> >> I had to do this because i expect that K1 will get a large volume of data >> and i do not want this wired over to the other DC. >> >> I am writing the data at CL=ONE and reading the data at CL=ONE. I am >> seeing an issue where sometimes i get the data and other times i do not see >> the data. Does anyone know what could be going on here? >> >> A second larger question is - i am migrating from 0.7.4 to 0.8.7 , i can >> see that there are large changes in the yaml file, but a specific question >> i had was - how do i configure disk_access_mode like it used to be in 0.7.4? >> >> One observation i have made is that some nodes of the cross dc cluster >> are at different system times. This is something to fix but could this be >> why data is sometimes retrieved and other times not? Or is there some other >> thing to it? >> >> Would appreciate a quick response. >> >
Re: Queries on AuthN and AuthZ for multi tenant Cassandra
Thanks for the response Aaron. We do not anticipate more than 10-15 tenants on the cluster. Even if one does decide to create one KS/tenant, there is the problem of variable loads on the KS's. I went through this link http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-managementwhich does promise better memory management. I did have two more questions - - Was the new memory management written taking into account a situation of many KS's? (In other words, did multi-tenancy influence the re-design of memory management?) - i know that users trying out multi-tenancy are generally recommending not to create many Ks's/CF's, but i am wondering if there is any documentation for why this happens or the details on the negative impact on memory/performance?and are there are any performance benchmarks available for Cassandra 1.0 clusters with many KS's? On Wed, Feb 1, 2012 at 12:11 PM, aaron morton wrote: > The existing authentication plug-in does not support row level > authorization. > > You will need to add authentication to your API layer to ensure that a > request from client X always has the client X key prefix. Or modify > cassandra to provide row level authentication. > > The 1.x Memtable memory management is awesome, but I would still be > hesitant about creating KS's and CF's at the request of an API client. > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 2/02/2012, at 8:52 AM, Subrahmanya Harve wrote: > > > We are using Cassandra 0.8.7 and building a multi-tenant cassandra > platform > > where we have a common KS and common CFs for all tenants. By using > Hector's > > virtual keyspaces, we are able to add modify rowkeys to have a tenant > > specific id. (Note that we do not allow tenants to modify/create KS/CF. > We > > just allow tenants to write and read data) However we are in the process > of > > adding authentication and authorization on top of this platform such that > > no tenant should be able to retrieve data belonging to any other tenant. > > > > By configuring Cassandra for security using the documentation here - > > http://www.datastax.com/docs/0.8/configuration/authentication , we were > > able to apply the security constraints on the common keyspace and common > > CFs. However this does not prevent a tenant from retrieving data > belonging > > to another tenant. For this to happen, we would need to have separate CFs > > and/or keyspaces for each tenant. > > Looking for more information on the topic here > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Multi-tenancy-and-authentication-and-authorization-td5935230.htmland > > other places, it looks like the recommendation is "not" to create > > separate CFs and KSs for every tenant as this would have impacts on > > Memtables and other memory issues. Does this recommendation still hold > > good? > > With jiras like > > https://issues.apache.org/jira/browse/CASSANDRA-2006resolved, does it > > mean we can now create multiple (but limited) CFs and KSs? > > More generally, how do we prevent a tenant from intentional/accidental > data > > manipulation of data owned by another tenant? (given that all tenants > will > > provide the right credentials) > >
Re: Queries on AuthN and AuthZ for multi tenant Cassandra
Thank you. Is that true for 0.8.7 as well? On Thu, Feb 2, 2012 at 8:20 AM, Jeremiah Jordan < jeremiah.jor...@morningstar.com> wrote: > 10-15 KS should be fine. The issue is when you want to have hundreds or > thousands of KS/CF. > > -Jeremiah > > -Original Message----- > From: Subrahmanya Harve [mailto:subrahmanyaha...@gmail.com] > Sent: Thursday, February 02, 2012 1:43 AM > To: dev@cassandra.apache.org > Subject: Re: Queries on AuthN and AuthZ for multi tenant Cassandra > > Thanks for the response Aaron. > > We do not anticipate more than 10-15 tenants on the cluster. Even if one > does decide to create one KS/tenant, there is the problem of variable > loads > on the KS's. I went through this link > http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-mem > ory-and-disk-space-managementwhich > does promise better memory management. I did have two more questions > - > - Was the new memory management written taking into account a situation > of > many KS's? (In other words, did multi-tenancy influence the re-design of > memory management?) > - i know that users trying out multi-tenancy are generally recommending > not > to create many Ks's/CF's, but i am wondering if there is any > documentation > for why this happens or the details on the negative impact on > memory/performance?and are there are any performance benchmarks > available > for Cassandra 1.0 clusters with many KS's? > > > On Wed, Feb 1, 2012 at 12:11 PM, aaron morton > wrote: > > > The existing authentication plug-in does not support row level > > authorization. > > > > You will need to add authentication to your API layer to ensure that a > > request from client X always has the client X key prefix. Or modify > > cassandra to provide row level authentication. > > > > The 1.x Memtable memory management is awesome, but I would still be > > hesitant about creating KS's and CF's at the request of an API client. > > > > Cheers > > > > > > - > > Aaron Morton > > Freelance Developer > > @aaronmorton > > http://www.thelastpickle.com > > > > On 2/02/2012, at 8:52 AM, Subrahmanya Harve wrote: > > > > > We are using Cassandra 0.8.7 and building a multi-tenant cassandra > > platform > > > where we have a common KS and common CFs for all tenants. By using > > Hector's > > > virtual keyspaces, we are able to add modify rowkeys to have a > tenant > > > specific id. (Note that we do not allow tenants to modify/create > KS/CF. > > We > > > just allow tenants to write and read data) However we are in the > process > > of > > > adding authentication and authorization on top of this platform such > that > > > no tenant should be able to retrieve data belonging to any other > tenant. > > > > > > By configuring Cassandra for security using the documentation here - > > > http://www.datastax.com/docs/0.8/configuration/authentication , we > were > > > able to apply the security constraints on the common keyspace and > common > > > CFs. However this does not prevent a tenant from retrieving data > > belonging > > > to another tenant. For this to happen, we would need to have > separate CFs > > > and/or keyspaces for each tenant. > > > Looking for more information on the topic here > > > > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Mult > i-tenancy-and-authentication-and-authorization-td5935230.htmland<http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Mult%0Ai-tenancy-and-authentication-and-authorization-td5935230.htmland> > > > other places, it looks like the recommendation is "not" to create > > > separate CFs and KSs for every tenant as this would have impacts on > > > Memtables and other memory issues. Does this recommendation still > hold > > > good? > > > With jiras like > > > https://issues.apache.org/jira/browse/CASSANDRA-2006resolved, does > it > > > mean we can now create multiple (but limited) CFs and KSs? > > > More generally, how do we prevent a tenant from > intentional/accidental > > data > > > manipulation of data owned by another tenant? (given that all > tenants > > will > > > provide the right credentials) > > > > >