sstable2json

2011-04-22 Thread Subrahmanya Harve
Hi,

Is there a tool similar to sstable2json that can be used to convert data in
commitlog to json? Or does sstable2json let us read the commitlog as well?

Regards,
smh.


Re: sstable2json

2011-04-24 Thread Subrahmanya Harve
Thanks Edward. Thats a good idea.

Regards,
smh.


On Sat, Apr 23, 2011 at 11:04 AM, Edward Capriolo wrote:

> On Fri, Apr 22, 2011 at 9:07 PM, Jonathan Ellis  wrote:
> > No.
> >
> > On Fri, Apr 22, 2011 at 3:22 PM, Subrahmanya Harve
> >  wrote:
> >> Hi,
> >>
> >> Is there a tool similar to sstable2json that can be used to convert data
> in
> >> commitlog to json? Or does sstable2json let us read the commitlog as
> well?
> >>
> >> Regards,
> >> smh.
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
> >
>
> You could
> 1)configure a second instance of cassandra
> 2)copy the commit logs to it
> 3) start the instance. (this will write the commit logs to memtables)
> 4) flush the column families
> 5) sstable2json the results
>


Data retrieval inconsistent

2011-11-10 Thread Subrahmanya Harve
I am facing an issue in 0.8.7 cluster -

- I have two clusters in two DCs (rather one cross dc cluster) and two
keyspaces. But i have only configured one keyspace to replicate data to the
other DC and the other keyspace to not replicate over to the other DC.
Basically this is the way i ran the keyspace creation  -
create keyspace K1 with
placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and
strategy_options = [{replication_factor:1}];
create keyspace K2 with
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options = [{DC1:2, DC2:2}];

I had to do this because i expect that K1 will get a large volume of data
and i do not want this wired over to the other DC.

I am writing the data at CL=ONE and reading the data at CL=ONE. I am seeing
an issue where sometimes i get the data and other times i do not see the
data. Does anyone know what could be going on here?

A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i can
see that there are large changes in the yaml file, but a specific question
i had was - how do i configure disk_access_mode like it used to be in 0.7.4?

One observation i have made is that some nodes of the cross dc cluster are
at different system times. This is something to fix but could this be why
data is sometimes retrieved and other times not? Or is there some other
thing to it?

Would appreciate a quick response.


Re: Data retrieval inconsistent

2011-11-10 Thread Subrahmanya Harve
Thanks Ed and Jeremiah for that useful info.
"I am pretty sure the way you have K1 configured it will be placed across
both DC's as if you had large ring.  If you want it only in DC1 you need to
say DC1:1, DC2:0."
Infact i do want K1 to be available across both DCs as if i had a large
ring. I just do not want them to replicate over across DCs. Also i did try
doing it like you said DC1:1, DC2:0 but wont that mean that, all my data
goes into DC1 irrespective of whether the data is getting into the nodes of
DC1 or DC2, thereby creating a "hot DC"? Since the volume of data for this
case is huge, that might create a load imbalance on DC1? (Am i missing
something?)


On Thu, Nov 10, 2011 at 1:30 PM, Jeremiah Jordan <
jeremiah.jor...@morningstar.com> wrote:

> I am pretty sure the way you have K1 configured it will be placed across
> both DC's as if you had large ring.  If you want it only in DC1 you need to
> say DC1:1, DC2:0.
> If you are writing and reading at ONE you are not guaranteed to get the
> data if RF > 1.  If RF = 2, and you write with ONE, you data could be
> written to server 1, and then read from server 2 before it gets over there.
>
> The differing on server times will only really matter for TTL's.  Most
> everything else works off comparing user supplied times.
>
> -Jeremiah
>
>
> On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:
>
>>
>> I am facing an issue in 0.8.7 cluster -
>>
>> - I have two clusters in two DCs (rather one cross dc cluster) and two
>> keyspaces. But i have only configured one keyspace to replicate data to the
>> other DC and the other keyspace to not replicate over to the other DC.
>> Basically this is the way i ran the keyspace creation  -
>>create keyspace K1 with placement_strategy='org.**
>> apache.cassandra.locator.**SimpleStrategy' and strategy_options =
>> [{replication_factor:1}];
>>create keyspace K2 with placement_strategy='org.**
>> apache.cassandra.locator.**NetworkTopologyStrategy' and strategy_options
>> = [{DC1:2, DC2:2}];
>>
>> I had to do this because i expect that K1 will get a large volume of data
>> and i do not want this wired over to the other DC.
>>
>> I am writing the data at CL=ONE and reading the data at CL=ONE. I am
>> seeing an issue where sometimes i get the data and other times i do not see
>> the data. Does anyone know what could be going on here?
>>
>> A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i can
>> see that there are large changes in the yaml file, but a specific question
>> i had was - how do i configure disk_access_mode like it used to be in 0.7.4?
>>
>> One observation i have made is that some nodes of the cross dc cluster
>> are at different system times. This is something to fix but could this be
>> why data is sometimes retrieved and other times not? Or is there some other
>> thing to it?
>>
>> Would appreciate a quick response.
>>
>


Re: Queries on AuthN and AuthZ for multi tenant Cassandra

2012-02-01 Thread Subrahmanya Harve
Thanks for the response Aaron.

We do not anticipate more than 10-15 tenants on the cluster. Even if one
does decide to create one KS/tenant, there is the problem of variable loads
on the KS's. I went through this link
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-managementwhich
does promise better memory management. I did have two more questions
-
- Was the new memory management written taking into account a situation of
many KS's? (In other words, did multi-tenancy influence the re-design of
memory management?)
- i know that users trying out multi-tenancy are generally recommending not
to create many Ks's/CF's, but i am wondering if there is any documentation
for why this happens or the details on the negative impact on
memory/performance?and are there are any performance benchmarks available
for Cassandra 1.0 clusters with many KS's?


On Wed, Feb 1, 2012 at 12:11 PM, aaron morton wrote:

> The existing authentication plug-in does not support row level
> authorization.
>
> You will need to add authentication to your API layer to ensure that a
> request from client X always has the client X key prefix. Or modify
> cassandra to provide row level authentication.
>
> The 1.x Memtable memory management is awesome, but I would still be
> hesitant about creating KS's and CF's at the request of an API client.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/02/2012, at 8:52 AM, Subrahmanya Harve wrote:
>
> > We are using Cassandra 0.8.7 and building a multi-tenant cassandra
> platform
> > where we have a common KS and common CFs for all tenants. By using
> Hector's
> > virtual keyspaces, we are able to add modify rowkeys to have a tenant
> > specific id. (Note that we do not allow tenants to modify/create KS/CF.
> We
> > just allow tenants to write and read data) However we are in the process
> of
> > adding authentication and authorization on top of this platform such that
> > no tenant should be able to retrieve data belonging to any other tenant.
> >
> > By configuring Cassandra for security using the documentation here -
> > http://www.datastax.com/docs/0.8/configuration/authentication , we were
> > able to apply the security constraints on the common keyspace and common
> > CFs. However this does not prevent a tenant from retrieving data
> belonging
> > to another tenant. For this to happen, we would need to have separate CFs
> > and/or keyspaces for each tenant.
> > Looking for more information on the topic here
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Multi-tenancy-and-authentication-and-authorization-td5935230.htmland
> > other places, it looks like the recommendation is "not" to create
> > separate CFs and KSs for every tenant as this would have impacts on
> > Memtables and other memory issues. Does this recommendation still hold
> > good?
> > With jiras like
> > https://issues.apache.org/jira/browse/CASSANDRA-2006resolved, does it
> > mean we can now create multiple (but limited) CFs and KSs?
> > More generally, how do we prevent a tenant from intentional/accidental
> data
> > manipulation of data owned by another tenant? (given that all tenants
> will
> > provide the right credentials)
>
>


Re: Queries on AuthN and AuthZ for multi tenant Cassandra

2012-02-02 Thread Subrahmanya Harve
Thank you.
Is that true for 0.8.7 as well?


On Thu, Feb 2, 2012 at 8:20 AM, Jeremiah Jordan <
jeremiah.jor...@morningstar.com> wrote:

> 10-15 KS should be fine.  The issue is when you want to have hundreds or
> thousands of KS/CF.
>
> -Jeremiah
>
> -Original Message-----
> From: Subrahmanya Harve [mailto:subrahmanyaha...@gmail.com]
> Sent: Thursday, February 02, 2012 1:43 AM
> To: dev@cassandra.apache.org
> Subject: Re: Queries on AuthN and AuthZ for multi tenant Cassandra
>
> Thanks for the response Aaron.
>
> We do not anticipate more than 10-15 tenants on the cluster. Even if one
> does decide to create one KS/tenant, there is the problem of variable
> loads
> on the KS's. I went through this link
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-mem
> ory-and-disk-space-managementwhich
> does promise better memory management. I did have two more questions
> -
> - Was the new memory management written taking into account a situation
> of
> many KS's? (In other words, did multi-tenancy influence the re-design of
> memory management?)
> - i know that users trying out multi-tenancy are generally recommending
> not
> to create many Ks's/CF's, but i am wondering if there is any
> documentation
> for why this happens or the details on the negative impact on
> memory/performance?and are there are any performance benchmarks
> available
> for Cassandra 1.0 clusters with many KS's?
>
>
> On Wed, Feb 1, 2012 at 12:11 PM, aaron morton
> wrote:
>
> > The existing authentication plug-in does not support row level
> > authorization.
> >
> > You will need to add authentication to your API layer to ensure that a
> > request from client X always has the client X key prefix. Or modify
> > cassandra to provide row level authentication.
> >
> > The 1.x Memtable memory management is awesome, but I would still be
> > hesitant about creating KS's and CF's at the request of an API client.
> >
> > Cheers
> >
> >
> > -
> > Aaron Morton
> > Freelance Developer
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 2/02/2012, at 8:52 AM, Subrahmanya Harve wrote:
> >
> > > We are using Cassandra 0.8.7 and building a multi-tenant cassandra
> > platform
> > > where we have a common KS and common CFs for all tenants. By using
> > Hector's
> > > virtual keyspaces, we are able to add modify rowkeys to have a
> tenant
> > > specific id. (Note that we do not allow tenants to modify/create
> KS/CF.
> > We
> > > just allow tenants to write and read data) However we are in the
> process
> > of
> > > adding authentication and authorization on top of this platform such
> that
> > > no tenant should be able to retrieve data belonging to any other
> tenant.
> > >
> > > By configuring Cassandra for security using the documentation here -
> > > http://www.datastax.com/docs/0.8/configuration/authentication , we
> were
> > > able to apply the security constraints on the common keyspace and
> common
> > > CFs. However this does not prevent a tenant from retrieving data
> > belonging
> > > to another tenant. For this to happen, we would need to have
> separate CFs
> > > and/or keyspaces for each tenant.
> > > Looking for more information on the topic here
> > >
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Mult
> i-tenancy-and-authentication-and-authorization-td5935230.htmland<http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Mult%0Ai-tenancy-and-authentication-and-authorization-td5935230.htmland>
> > > other places, it looks like the recommendation is "not" to create
> > > separate CFs and KSs for every tenant as this would have impacts on
> > > Memtables and other memory issues. Does this recommendation still
> hold
> > > good?
> > > With jiras like
> > > https://issues.apache.org/jira/browse/CASSANDRA-2006resolved, does
> it
> > > mean we can now create multiple (but limited) CFs and KSs?
> > > More generally, how do we prevent a tenant from
> intentional/accidental
> > data
> > > manipulation of data owned by another tenant? (given that all
> tenants
> > will
> > > provide the right credentials)
> >
> >
>