Re: Cassandra read throughput with little/no caching.

2012-12-28 Thread Yiming Sun
bit meager. Also is it possible to batch the writes together? -- Y. On Mon, Dec 24, 2012 at 7:28 AM, James Masson wrote: > > > On 21/12/12 17:56, Yiming Sun wrote: > >> James, you could experiment with Row cache, with off-heap JNA cache, and >> see if it helps. My own expe

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
with LRU, but giving the small data space you have, you may be able to fit the data from one column family entirely into the row cache. On Fri, Dec 21, 2012 at 12:03 PM, James Masson wrote: > > > On 21/12/12 16:27, Yiming Sun wrote: > >> James, using RandomPartitioner, the orde

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
; > thanks for the reply > > > On 21/12/12 14:36, Yiming Sun wrote: > >> I have a few questions for you, James, >> >> 1. how many nodes are in your Cassandra ring? >> > > 2 or 3 - depending on environment - it doesn't seem to make a difference > to

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
I have a few questions for you, James, 1. how many nodes are in your Cassandra ring? 2. what is the replication factor? 3. when you say sequentially, what do you mean? what Partitioner do you use? 4. how many columns per row? how much data per row? per column? 5. what client library do you use

Re: strange row cache behavior

2012-12-04 Thread Yiming Sun
sandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 5/12/2012, at 4:23 AM, Yiming Sun wrote: > > Hi Aaron, > > Thank you,and your explanation makes sense. At the time, I thought having > 1GB of row cache on each node was plenty enoug

Re: Row caching + Wide row column family == almost crashed?

2012-12-04 Thread Yiming Sun
.com > > On 4/12/2012, at 5:31 PM, Yiming Sun wrote: > > I ran into a different problem with Row cache recently, sent a message to > the list, but it didn't get picked up. I am hoping someone can help me > understand the issue. Our data also has rather wide rows, not necessari

Re: strange row cache behavior

2012-12-04 Thread Yiming Sun
lance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 1/12/2012, at 5:11 AM, Yiming Sun wrote: > > > Does anyone have any comments/suggestions for me regarding this? Thanks > > > > > > I am trying to understand

Re: Row caching + Wide row column family == almost crashed?

2012-12-03 Thread Yiming Sun
I ran into a different problem with Row cache recently, sent a message to the list, but it didn't get picked up. I am hoping someone can help me understand the issue. Our data also has rather wide rows, not necessarily in the thousands range, but definitely in the upper-hundreds levels. They ar

Re: strange row cache behavior

2012-11-30 Thread Yiming Sun
Does anyone have any comments/suggestions for me regarding this? Thanks I am trying to understand some strange behavior of cassandra row cache. We > have a 6-node Cassandra cluster in a single data center on 2 racks, and the > neighboring nodes on the ring are from alternative racks. Each node

Re: need some help with row cache

2012-11-28 Thread Yiming Sun
r1 Up Normal 590.26 GB 33.33% 113427455640312821154458202477256070484 x.x.x.6DC1 r2 Up Normal 583.21 GB 33.33% 141784319550391026443072753096570088105 On Wed, Nov 28, 2012 at 9:09 AM, Yiming Sun wrote: > Thanks guys. H

Re: need some help with row cache

2012-11-28 Thread Yiming Sun
to use cassandra-cli. > > -Bryan > > > On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wrote: > > Use cassandracli. > > > > > > Thanks. > > -Wei > > > > Sent from my Samsung smartphone on AT&T > > > > > > -

Re: need some help with row cache

2012-11-27 Thread Yiming Sun
Also, what command can I used to see the "caching" setting? "DESC TABLE " doesn't list caching at all. Thanks. -- Y. On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun wrote: > Hi Bryan, > > Thank you very much for this information. So in other words, the setti

Re: need some help with row cache

2012-11-27 Thread Yiming Sun
Nov 27, 2012 at 8:16 PM, Yiming Sun wrote: > > Hello, > > > > but it is not clear to me where this setting belongs to, because even in > the > > v1.1.6 conf/cassandra.yaml, there is no such property, and apparently > > adding this property to the yaml cau

Re: key cache size in mb = 0 doesn't work?

2012-11-26 Thread Yiming Sun
; being exposed over anything other than JMX, sorry. You can poke around with > nodetool on the nodes themselves ( that uses JMX itself ) to get other > metrics, but i have not been able to get cache stats out of nodetool > cfstats. I think it might be because they are no longer on the CF

Re: key cache size in mb = 0 doesn't work?

2012-11-26 Thread Yiming Sun
m> wrote: > > SSTables in http://en.wikipedia.org/wiki/Page_cache maybe? > How many rows do you have in this CF? Are you getting all columns? > > What do the cassandra.db mbeans say ( hit ratio, cache requests, items etc > ) > > regards, > Andras > > On 27 Nov

key cache size in mb = 0 doesn't work?

2012-11-26 Thread Yiming Sun
Hi, I am carrying out some performance test against a 6-node cassandra cluster (v1.1.0), and need to disable the key cache entirely as one of the scenarios. However, by setting key_cache_size_in_mb to 0, I am still seeing caching effects. For example, when I fetch a set of 5000 rows, the first t

Re: schema fail to load on some nodes

2012-05-22 Thread Yiming Sun
ira/browse/CASSANDRA-4269 > > > On Tue, May 22, 2012 at 4:10 PM, Yiming Sun wrote: > >> Hi, >> >> We are setting up a 6-node cassandra cluster within one data center. 3 >> in rack1 and the other 3 in rack2. The tokens are assigned alternating >> between

schema fail to load on some nodes

2012-05-22 Thread Yiming Sun
Hi, We are setting up a 6-node cassandra cluster within one data center. 3 in rack1 and the other 3 in rack2. The tokens are assigned alternating between rack 1 and rack 2. There is one seed node in each rack. Below is the ring: r1-node1DC1 r1 0 (seed) r2-node1DC1

Re: Correct way to set strategy options in cqlsh?

2012-05-22 Thread Yiming Sun
AND strategy_options={us-east:1, us-west:1}; On Tue, May 22, 2012 at 11:10 AM, Damick, Jeffrey < jeffrey.dam...@neustar.biz> wrote: > What’s the correct way to set the strategy options for the > networktopologystrategy with cqlsh? > > I’ve tried several variations, but what’s expected way to e

Re: how can we get (a lot) more performance from cassandra

2012-05-21 Thread Yiming Sun
long for a single thread > to make this call" > > In a low write environment reads should be flying along. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/05/2012, at 1:44 PM,

Re: need some clarification on recommended memory size

2012-05-17 Thread Yiming Sun
Hi Aaron, Thank you for guiding us by breaking down the issue. Please see my answers embedded > Is this a single client ? Yes > How many columns is it asking for ? the client knows a list of all row keys, and it randomly picks 100, and loops 100 times. It first reads a metadata column to fig

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
Hi Aaron T., No, actually we haven't, but this sounds like a good suggestion. I can definitely try THIS before jumping into other things such as enabling row cache etc. Thanks! -- Y. On Wed, May 16, 2012 at 9:38 PM, Aaron Turner wrote: > On Wed, May 16, 2012 at 12:59 PM, Yiming Su

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
012-05-16 20:38:37 +, Yiming Sun said: > > > Thanks Oleg. Another caveat from our side is, we have a very large data > space (imaging picking 100 items out of 3 million, the chance of having 2 > items from the same bin is pretty low). We will experiment with row cache, > and hop

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
; >> On the bright side - Cassandra read throughput will remain consistent, >> regardless of your volume. But you are going to have to "wrap" your reads >> with memcache (or redis), so that the bulk of your reads can be served from >> memory. >> >> >&g

Re: how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
s > with memcache (or redis), so that the bulk of your reads can be served from > memory. > > > Thanks, > Mike Peters > > > On 5/16/2012 3:59 PM, Yiming Sun wrote: > > Hello, > > I asked the question as a follow-up under a different thread, so I > figure

how can we get (a lot) more performance from cassandra

2012-05-16 Thread Yiming Sun
Hello, I asked the question as a follow-up under a different thread, so I figure I should ask here instead in case the other one gets buried, and besides, I have a little more information. "We find the lack of performance disturbing" as we are only able to get about 3-4MB/sec read performance out

Re: need some clarification on recommended memory size

2012-05-16 Thread Yiming Sun
Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 16/05/2012, at 11:12 AM, Yiming Sun wrote: > > Thanks Tyler... so my understanding is, even if Cassandra doesn't do > off-heap caching, by having a large-enough memory, it minimize the chance > o

Re: need some clarification on recommended memory size

2012-05-15 Thread Yiming Sun
3:19 PM, Yiming Sun wrote: > >> Hello, >> >> I was reading the Apache Cassandra 1.0 Documentation PDF dated May 10, >> 2012, and had some questions on what the recommended memory size is. >> >> Below is the snippet from the PDF. Bullet 1 suggests to have 16-32GB

Re: RE 200TB in Cassandra ?

2012-04-19 Thread Yiming Sun
600 TB is really a lot, even 200 TB is a lot. In our organization, storage at such scale is handled by our storage team and they purchase specialized (and very expensive) equipment from storage hardware vendors because at this scale, performance and reliability is absolutely critical. but it soun

Re: data size difference between supercolumn and regular column

2012-04-06 Thread Yiming Sun
on. > Don't forget to unlimit number of file descriptors, and monitor tpstats > and iostat. > > maki > > From iPhone > > > On 2012/04/04, at 22:19, Yiming Sun wrote: > > Cool, I will look into this new leveled compaction strategy and give it a > try. > &g

Re: data size difference between supercolumn and regular column

2012-04-04 Thread Yiming Sun
h other people, turn on compaction. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 3/04/2012, at 9:19 AM, Yiming Sun wrote: > > Yup Jeremiah, I learned a hard lesson on how cassa

Re: data size difference between supercolumn and regular column

2012-04-02 Thread Yiming Sun
of disk space. > You really want to try and stay around 50%, 60-70% works, but only if it > is spread across multiple column families, and even then you can run into > issues when doing repairs. > > -Jeremiah > > > > On Apr 1, 2012, at 9:44 PM, Yiming Sun wrote: &

Re: data size difference between supercolumn and regular column

2012-04-01 Thread Yiming Sun
There is some overhead involved in the super columns: the super col name, > length of the name and the number of columns. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 29/03/2012, at 9:47 AM, Y

Re: a question on cassandra data file size

2012-03-30 Thread Yiming Sun
t; > On Fri, Mar 30, 2012 at 9:01 AM, Yiming Sun wrote: > > Hi, > > > > I have a question on the size of cassandra data files. After we upgraded > > from cassandra 0.8 to 1.0, and changed our schema to use regular columns > > instead of supercolumns, the aggregated

a question on cassandra data file size

2012-03-30 Thread Yiming Sun
Hi, I have a question on the size of cassandra data files. After we upgraded from cassandra 0.8 to 1.0, and changed our schema to use regular columns instead of supercolumns, the aggregated size of cassandra data files reduced by more than half. The source data set is the same, and we didn't set

Re: data size difference between supercolumn and regular column

2012-03-28 Thread Yiming Sun
data reduction purely because of the schema change? Thanks. -- Y. On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun wrote: > Hi, > > We are trying to estimate the amount of storage we need for a production > cassandra cluster. While I was doing the calculation, I noticed a very > dr

data size difference between supercolumn and regular column

2012-03-28 Thread Yiming Sun
Hi, We are trying to estimate the amount of storage we need for a production cassandra cluster. While I was doing the calculation, I noticed a very dramatic difference in terms of storage space used by cassandra data files. Our previous setup consists of a single-node cassandra 0.8.x with no rep

Re: what other ports than 7199 need to be open for nodetool to work?

2012-03-26 Thread Yiming Sun
detool is often > used to connect to 'localhost' which generally does not have any > firewall rules at all so it usually works. It is still connecting to a > random second port though. > > On Mon, Mar 26, 2012 at 2:42 PM, Yiming Sun wrote: > > Hi, > > > >

what other ports than 7199 need to be open for nodetool to work?

2012-03-26 Thread Yiming Sun
Hi, We opened port 7199 on a cassandra node, but were unable to get a nodetool to talk to it remotely unless we turn off the firewall entirely. So what other ports should be opened for this -- online posts all indicate that JMX uses a random dynamic port, which would be difficult to create a fire

Re: yet a couple more questions on composite columns

2012-02-06 Thread Yiming Sun
Thanks for the clarification, Jim. I didn't know the first comparator was defined as DateType. Yeah, in that case, the beginning of the epoch is the only choice. -- Y. On Mon, Feb 6, 2012 at 11:35 AM, Jim Ancona wrote: > On Sat, Feb 4, 2012 at 8:54 PM, Yiming Sun wrote: > > In

Re: yet a couple more questions on composite columns

2012-02-05 Thread Yiming Sun
e: > > FilesMeta > FilesData > > > 2012/2/5 Yiming Sun > >> Interesting idea, Jim. Is there a reason you don't you use >> "metadata:{accountId}" instead? For performance reasons? >> >> >> On Sat, Feb 4, 2012 at 6:24 PM, Jim Anco

Re: yet a couple more questions on composite columns

2012-02-04 Thread Yiming Sun
ata columns, e.g. a column of > 1970-01-01:{accountId} for a metadata column where the Composite is > DateType:UTF8Type. > > Jim > > On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun wrote: > > Thanks Andrey and Chris. It sounds like we don't necessarily have to use > >

Re: yet a couple more questions on composite columns

2012-02-04 Thread Yiming Sun
m both. When you have A, you can fetch B, and vice versa. > > > 2012/2/4 Yiming Sun > >> Interesting idea, R.V. But what did you do with the row keys? >> >> >> On Sat, Feb 4, 2012 at 2:29 PM, R. Verlangen wrote: >> >>> I also made something

Re: yet a couple more questions on composite columns

2012-02-04 Thread Yiming Sun
really good at reading, so this should not be an issue. > > Cheers! > > > 2012/2/4 Yiming Sun > >> Thanks Andrey and Chris. It sounds like we don't necessarily have to use >> composite columns. From what I understand about dynamic CF, each row may >> have comp

Re: yet a couple more questions on composite columns

2012-02-04 Thread Yiming Sun
On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken wrote: > > > > On 4 February 2012 06:21, Yiming Sun wrote: > >> I cannot have one composite column name with 3 components while another >> with 4 components? > > Just put 4 components and left last empty (if it is

yet a couple more questions on composite columns

2012-02-03 Thread Yiming Sun
Hi, We are getting close to replacing our super-column based schema to something more efficient, and I am trying to wrap my heads around composite columns. An email from another list member has clarified that composite and non-composite columns should not be mixed in the same CF because only one

Re: mysterious data disappearance - what happened?

2011-09-10 Thread Yiming Sun
Hi Peter, Good call! I went and checked the seed, and indeed it left unchanged when we copied the config yaml file from the first node to the second node. Thanks! -- Y. On Fri, Sep 9, 2011 at 7:47 PM, Peter Schuller wrote: > > cluster name for both machines. So in other words, if we want to l

mysterious data disappearance - what happened?

2011-09-08 Thread Yiming Sun
Hello, If two different instances of cassandra are running on separate machines, but both are unfortunately configured to use the default cluster name, "Test Cluster", do they gang up as one cluster (even though they were intended to be two separate stand-alone instances), so that dropping keyspac

Re: looking for information on composite columns

2011-09-02 Thread Yiming Sun
Thanks Edward. What's the link to your blog? On Fri, Sep 2, 2011 at 10:43 AM, Edward Capriolo wrote: > > On Fri, Sep 2, 2011 at 9:15 AM, Yiming Sun wrote: > >> Hi, >> >> I am looking for information/tutorials on the use of composite columns, >> including h

looking for information on composite columns

2011-09-02 Thread Yiming Sun
Hi, I am looking for information/tutorials on the use of composite columns, including how to use it, what kind of indexing it can offer, and its advantage over super columns. I googled but came up with very little information. There is a blog article from high performance cassandra on the compos

Re: 8 million Cassandra data files on disk

2011-08-02 Thread Yiming Sun
, Jonathan Ellis wrote: > I don't remember a removing-compacted-files bug in 0.7.0, but you > should absolutely upgrade to 0.7.8 for several dozen other fixes, > including some severe ones -- see NEWS.txt. > > On Tue, Aug 2, 2011 at 4:29 PM, Yiming Sun wrote: > > Hi Jerem

Re: 8 million Cassandra data files on disk

2011-08-02 Thread Yiming Sun
gt; On Tue, 2011-08-02 at 16:09 -0400, Yiming Sun wrote: > > Hi, > > > > I am new to Cassandra, and am hoping someone could help me understand > > the (large amount of small) data files on disk that Cassandra > > generates. > > > > The reason we are using Cassand

8 million Cassandra data files on disk

2011-08-02 Thread Yiming Sun
Hi, I am new to Cassandra, and am hoping someone could help me understand the (large amount of small) data files on disk that Cassandra generates. The reason we are using Cassandra is because we are dealing with thousands to millions of small text files on disk, so we are experimenting with Cassa