Re: Under expectation response time for reads

2011-01-20 Thread Miguel Verde
Disable Nagle's algorithm and you should see much better performance. It must not be used on loopback. http://markmail.org/message/rgauuflglwemm24o On Thu, Jan 20, 2011 at 6:24 AM, George Ciubotaru < george.ciubot...@weedle.com> wrote: > Hello, > > We are in the process of evaluating Cassandra t

Re: server needs thrift to run also?

2010-07-12 Thread Miguel Verde
I'll take a guess. S Ahmed, the Thrift compiler takes a .thrift file and can generate client and server code for it in your language of choice. This code depends on the Thrift runtime library in that language. For instance, the Thrift Java runtime library is bundled with Cassandra as a jar. Whe

Re: Coke Products at Digg?

2010-07-07 Thread Miguel Verde
Is it true the Mexican engineers have managed to remove the dependency on HCFS? That should spur uptake. On Wed, Jul 7, 2010 at 11:45 AM, Chris Goffinet wrote: > Hahaha. > > Well.. I can comment that we do still have coke products, we have been > doing Cosco runs of recent, and now serve Mexica

Re: Coke Products at Digg?

2010-07-07 Thread Miguel Verde
Dr. Pepper has recently been picked up by Coca Cola as well. I wonder if the UnCola solutions like 7Up and Fanta are just a fad? On Wed, Jul 7, 2010 at 10:50 AM, Mike Malone wrote: > On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote: > >> >> I heard a rumor that Digg was moving away from Coca-C

Re: AVRO client API

2010-06-18 Thread Miguel Verde
On Fri, Jun 18, 2010 at 6:23 PM, Tatu Saloranta wrote: > Not that I wanted to criticize choices, but do they actually allow use > of JSON as encoding? > Avro does use JSON for specifying schemas, but I wasn't aware of being > able to use it for encoding data. > Likewise with Thrift. > Yes, each

Re: Pelops - a new Java client library paradigm

2010-06-12 Thread Miguel Verde
afaik, Cassandra does nothing to guarantee connection-level read your own writes consistency beyond its usual consistency levels. See https://issues.apache.org/jira/browse/CASSANDRA-876 and the earlier http://issues.apache.org/jira/browse/CASSANDRA-132 On Jun 12, 2010, at 5:48 PM, Dan Washu

Re: Best Timestamp?

2010-05-26 Thread Miguel Verde
Right, in C# this would be (not the most efficient way, but you get the idea): long timestamp = (DateTime.Now.Ticks - new DateTime(1970, 1, 1).Ticks)/10; On Wed, May 26, 2010 at 4:50 PM, Mark Robson wrote: > On 26 May 2010 22:42, Steven Haar wrote: > >> What is the best timestamp to use while

Re: Nunit Testing & Cassandra

2010-05-25 Thread Miguel Verde
) - tests that two collections are equivalent. Two collections are equivalent if they contain the same items, in any order. Assert.That(listOfKeys, Is.EquivalentTo (TestService.GetListOfRowKeysFromCF("ColumnFamilyName","Keyspace1"))); From: Miguel Verde [mailt

Re: Nunit Testing & Cassandra

2010-05-25 Thread Miguel Verde
It would be helpful to know in what way the test fails, or more information about listOfKeys or the return value of GetListOfRowKeysFromCF at assert time, or for that matter what GetListOfRowKeysFromCF is, or the insertion code. Also, does Is.EquivalentTo compare object equality on the items insid

Re: Cassandra data model for financial data

2010-05-13 Thread Miguel Verde
(e.g. sharesOutstanding) but just these could be in additional CFs. On Thu, May 13, 2010 at 3:58 PM, Benjamin Black wrote: > On Thu, May 13, 2010 at 12:45 PM, Miguel Verde > wrote: > > I also think that's not a good design, but only because the typical query > > would have to

Re: Cassandra data model for financial data

2010-05-13 Thread Miguel Verde
I also think that's not a good design, but only because the typical query would have to hit several column families instead of just one. To answer your question, use a http://wiki.apache.org/cassandra/API#KeyRange which includes AAPL across all years you might want in your http://wiki.apache.org/c

Re: Best way to store millisecond-accurate data

2010-05-04 Thread Miguel Verde
ow (let's say if the original timesharding is one day per row, then we would have two rows for that day). Maybe batch processes could do that. Best regards, Daniel. 2010/4/24 Miguel Verde TimeUUID's time component is measured in 100-nanosecond intervals. The library you use might cal

Re: Multiple keyspaces per application?

2010-04-27 Thread Miguel Verde
Replication Factor, the number of copies (replicas) of the data that Cassandra will store and an important number for quorum consistency calculations. On Tue, Apr 27, 2010 at 11:14 AM, David Boxenhorn wrote: > Thanks!. er, what is RF? > > > On Tue, Apr 27, 2010 at 6:50 PM, banks wrote: > >>

Re: How to generate 'unique' identifiers for use in Cassandra

2010-04-26 Thread Miguel Verde
http://wiki.apache.org/cassandra/UUID if you don't need transactional ordering, ZooKeeper or something comparable if you do. 2010/4/26 Roland Hänel > Typically, in the SQL world we use things like AUTO_INCREMENT columns that > let us create a unique key automatically if a row is inserted into a

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-24 Thread Miguel Verde
Yes, one should use either the TBufferedTransport or TFramedTransport in Java for performance reasons. These are analogous to the C# Socket classes and you should see a performance improvement from buffering. On Apr 24, 2010, at 5:31 PM, Joost Ouwerkerk wrote: Is this something that al

Re: Best way to store millisecond-accurate data

2010-04-23 Thread Miguel Verde
TimeUUID's time component is measured in 100-nanosecond intervals. The library you use might calculate it with poorer accuracy or precision, but from a storage/comparison standpoint in Cassandra millisecond data is easily captured by it. One typical way of dealing with the data explosion of

Re: running cassandra as a service on windows

2010-04-23 Thread Miguel Verde
https://issues.apache.org/jira/browse/CASSANDRA-292 points to http://commons.apache.org/daemon/procrun.html which is used by other Apache software to implement Windows services in Java. CassandraDaemon conforms to the Commons Daemon spec. On Fri, Apr 23, 2010 at 2:20 PM, Jonathan Ellis wrote: >

Re: Is that normal to have some percent of reads/writes time out?

2010-04-22 Thread Miguel Verde
I see that you are aware of https://issues.apache.org/jira/browse/THRIFT-347 Have you applied the patch there? It worked for the Digg guys (probably the largest PHP user of Cassandra) and others in that JIRA issue. Timeouts are typical with unusually heavy load, node failure, and/or un-tuned param

Re: Does anybody work about transaction on cassandra ?

2010-04-22 Thread Miguel Verde
No, as far as I know no one is working on transaction support in Cassandra. Transactions are orthogonal to the design of Cassandra[1][2], although a system could be designed incorporating Cassandra and other elements a la Google's MegaStore[3] to support transactions. Google uses Paxos, one might

Re: Should I use Cassandra for general purpose DB?

2010-04-21 Thread Miguel Verde
On Wed, Apr 21, 2010 at 12:56 PM, Soichi Hayashi wrote: > So, I am interested in using Cassandra not because of large amount of data, > but because of following reasons. > > 1) It's easy to administrate and handle fail-over (and scale, of course) > 2) Easy to write an application that makes sense

Re: Cassandra data model for financial data

2010-04-21 Thread Miguel Verde
On Wed, Apr 21, 2010 at 12:17 PM, Steve Lihn wrote: > [...] > Design 1: Each attribute is a super column. Therefore each date is a > column. So we have: > > AAPL -> closingPrice -> { '2010-04-13' : 242, '2010-04-14': 245 } > AAPL -> volume -> { '2010-04-13' : 10.9m, '2010-04-14': 14.4m } > etc

Re: Filters

2010-04-20 Thread Miguel Verde
http://wiki.apache.org/cassandra/API#get_slice get_slice retrieves the values for either (a) a list of column names or (b) a range of columns, depending on the SlicePredicate you use. It does not allow you to filter a la SQL's WHERE. You would need to create your own index to do so, at least unti

Re: inserting rows in columns inside a supercolumn

2010-04-15 Thread Miguel Verde
Just to nitpick your representation a little bit, columnB/etc... are supercolumnB/etc..., key1/etc... are column1/etc..., and you can probably omit valueA/valueD designations entirely, it would still be understood. Columns in Cassandra always have timestamps, you can't omit them. Can you post a s

Re: framed transport

2010-04-15 Thread Miguel Verde
On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans wrote: > But, if you've enabled framing on the server, you will not > be able to use C# clients (last I checked, there was no framed transport > for C#). There *are* many clients that don't have framed transports, but the C# client had it added in No

Re: Is that possible to write a file system over Cassandra?

2010-04-14 Thread Miguel Verde
On Wed, Apr 14, 2010 at 9:26 PM, Avinash Lakshman < avinash.laksh...@gmail.com> wrote: > OPP is not required here. You would be better off using a Random > partitioner because you want to get a random distribution of the metadata. Not required, certainly. However, it strikes me that 1 cluster i

Re: Is that possible to write a file system over Cassandra?

2010-04-14 Thread Miguel Verde
On Wed, Apr 14, 2010 at 9:15 PM, Ken Sandney wrote: > Large files can be split into small blocks, and the size of block can be > tuned. It may increase the complexity of writing such a file system, but can > be for general purpose (not only for relative small files) Right, this is the path tha