Re: question about replicas & dynamic response to load

2011-03-06 Thread Shaun Cutts
our own solution? For immutable data, its what they excel at. >> Cassandra has amazing write capacity and its design focus is on scaling >> writes. I would not really consider it a good tool for the job of serving >> massive amounts of static content. >> >> Dan >>

question about replicas & dynamic response to load

2011-03-03 Thread Shaun Cutts
Hello, In our project our usage pattern is likely to be quite variable -- high for a a few days, then lower, etc could vary as much (or more) as 10x from peak to "non-peak". Also, much of our data is immutable -- but there is a considerable amount of it -- perhaps in the single digit TBs. Final

Re: limit on rows in a cf

2011-03-01 Thread Shaun Cutts
This isn't quite true, I think. RandomPartitioner uses MD5. So if you had 10^16 rows, you would have a 10^-6 chance of a collision, according to http://en.wikipedia.org/wiki/Birthday_attack ... and apparently MD5 isn't quite balanced, so your actual odds of a collision are worse (though I'm not

Re: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Shaun Cutts
There is/are lucandra/solandra: https://github.com/tjake/Lucandra -- Shaun On Feb 12, 2011, at 6:57 AM, Aklin_81 wrote: > I would like to text search for some of Entities/items stored in the > database through an AJAX powered application...Such that the user > starts typing and he can get

Re: Best way to detect/fix bitrot today?

2011-02-08 Thread Shaun Cutts
One thing that we're doing for (guaranteed) immutable data is to use MD5 signatures as keys... this will also prevent duplication, and it will allow detection (if not correction) of bitrot at the app level easy. On Feb 8, 2011, at 9:23 AM, Anand Somani wrote: > I should have clarified we have 3

Re: Do supercolumns have a purpose?

2011-02-07 Thread Shaun Cutts
I'm a newbie here, but, with apologies for my presumptuousness, I think you should deprecate SuperColumns. They are already distracting you, and as the years go by the cost of supporting them as you add more and more functionality is only likely to get worse. It would be better to concentrate o

Re: order of index expressions

2011-02-07 Thread Shaun Cutts
Jonathan, Thanks for your thoughts > On Sun, Feb 6, 2011 at 11:03 AM, Shaun Cutts wrote: >> What I think you should be doing is the following: open iterators on the >> matching keys for each of the indexes; the inside loop would pick an >> iterator at random, and

Re: Finding the intersection results of column sets of two rows

2011-02-06 Thread Shaun Cutts
mounts of denormalization. > And finding columns in client would require pulling unnecessary > columns like pulling 100,000 columns from a row of which only 60-70 > are required . > > Shaun, I hope my above clarification has clarified things a bit. Yes, > the rows, of which I

Re: Finding the intersection results of column sets of two rows

2011-02-06 Thread Shaun Cutts
In theory, you should be able to do joins by creating an extra column in one column family, holding the "foreign key" of the matching row in the other family. This assumes that the info you are joining on is available in both CFs (is not some sort of functional transformation). I have just fo

Re: order of index expressions

2011-02-06 Thread Shaun Cutts
y the strategy I just mentioned is pretty general (not depending on histograms, etc). Does it sound like a good idea? -- Shaun On Feb 6, 2011, at 12:15 AM, Jonathan Ellis wrote: > ColumnFamilyStore.scan > > On Sat, Feb 5, 2011 at 10:32 PM, Shaun Cutts wrote: >> Thanks for the r

Re: order of index expressions

2011-02-05 Thread Shaun Cutts
ering where the code that does this is... is it in java.org.apache.cassandra.db.columniterator.IndexedSliceReader? Thanks, -- Shaun On Feb 5, 2011, at 2:39 PM, Jonathan Ellis wrote: > On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts wrote: >> Hello, >> I'm wondering if cass

order of index expressions

2011-02-05 Thread Shaun Cutts
Hello, I'm wondering if cassandra is sensitive to the order of index expressions in (pycassa call) get_indexed_slices? If I have several column indexes available, will it attempt to optimize the order? Thanks, -- Shaun