On Wed, Jun 23, 2010 at 12:18 AM, David Boxenhorn wrote:
> Tatu, I did read your comments - and I appreciate them very much!
>
> I want someone to argue with me (using good arguments) since what I'm doing
> *does* seem weird to me - because no one else is doing it.
>
> What I mean by readable is t
On Tue, Jun 22, 2010 at 11:54 PM, David Boxenhorn wrote:
> Having a physical location encoded in the UUID *increases* the chance of a
> collision, because it means fewer random bits. There definitely will be more
> than one UUID created in the same clock unit on the same machine! The same
> bits t
On Tue, Jun 22, 2010 at 9:12 AM, David Boxenhorn wrote:
> A little bit of time fuzziness on the order of a few milliseconds is fine
> with me. This is user-generated data, so it only has to be time-ordered at
> the level that a user can perceive.
Ok, so mostly ordered. :-)
> I have no worries ab
On Tue, Jun 22, 2010 at 5:58 AM, David Boxenhorn wrote:
> I want to use UUIDs whose alphanumeric order is the same as their
> chronological order. So I'm generating Version 4 UUIDs (
...
> Is there anything wrong with this idea?
If you want to keep it completely ordered, it's probably not enough
On Fri, Jun 18, 2010 at 4:57 PM, Miguel Verde wrote:
> On Fri, Jun 18, 2010 at 6:23 PM, Tatu Saloranta
> wrote:
>>
>> Not that I wanted to criticize choices, but do they actually allow use
>> of JSON as encoding?
>> Avro does use JSON for specifying schemas, but I
On Fri, Jun 18, 2010 at 2:12 PM, Eric Evans wrote:
> On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote:
>> At the risk of asking about religion (but with no interest in hearing
>> about it), why Avro instead of something like plain-old-JSON over
>> HTTP?
>
> At the risk of having this thread vee
On Tue, Jun 8, 2010 at 1:28 AM, David Boxenhorn wrote:
> As I said above, I was wondering if I could come up with a robust algorithm,
> e.g. creating the new super columns and then attaching them at the end,
> which will not FUBAR my index if it fails.
>
Is this append-only? That is, never delete
On Tue, Jun 8, 2010 at 12:07 AM, David Boxenhorn wrote:
> I am not worried about getting the occasional wrong result - if I were, I
> couldn't use Cassandra. I am only worried about breaking the index as a
> whole. If concurrent changes to the tree happen to modify the same record, I
> don't mind
On Mon, Jun 7, 2010 at 3:09 PM, Ian Soboroff wrote:
> I was going to say, if ordered trees are your problem, Cassandra is not your
> solution. Try building something with Berkeley DB.
Also -- while there are no official plans for this, there have been
discussions on Voldemort list, wrt. possible
On Mon, Jun 7, 2010 at 12:06 AM, David Boxenhorn wrote:
> I wonder if there is a robust algorithm for maintaining b-trees that doesn't
> require atomicity? How about if you create the three new super columns
> first, then attach them to the parent, then delete the old super column? If
> it fails,
Yeah, or maybe just "clustering", since there is no branching structure.
It's quite commonly useful even on regular b-tree style storage (BDB
et al), as it can reduce per-entry overhead quite a bit. And allows
very efficient compression, if entries have lots of redundancy (xml or
json serialized da
On Tue, May 25, 2010 at 4:04 AM, Mark Greene wrote:
> I'm fairly certain the write path hits the commit log first, then the
> memtable.
True, but that does not make them any less sequential -- journal logs
are strictly sequential fast writes. Actual ordering occurs in memory,
and results are even
gt; Later,
>> Jeff
>>
>> On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta
>> wrote:
>>>
>>> On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
>>> wrote:
>>> > There are forEach methods in that would allow you to travel the
>>>
On Tue, Apr 27, 2010 at 10:49 PM, Jeff Zhang wrote:
> Mark,
>
> Thanks for your suggestion, It's really not a good idea to store one
> file in multiple columns in one row. The heap space problem will still
> exist. And I take your advice to store it in multiple rows, it works,
> I can event store
On Sun, Apr 25, 2010 at 5:43 PM, Jonathan Ellis wrote:
> On Sun, Apr 25, 2010 at 5:40 PM, Tatu Saloranta wrote:
>>> Now with TimeUUIDType, if two UUID have the same timestamps, they are
>>> ordered
>>> by bytes order.
>>
>> Naively for the whol
On Mon, Apr 26, 2010 at 10:35 AM, Ethan Rowe wrote:
> On 04/26/2010 01:26 PM, Isaac Arias wrote:
>>
>> On Apr 26, 2010, at 12:13 PM, Geoffry Roberts wrote:
>>
...
>> In my opinion, a mapping solution for Cassandra should be more like a
>> Template. Something that helps map (back and forth) rows to
On Sat, Apr 24, 2010 at 2:08 AM, Sylvain Lebresne wrote:
> On Sat, Apr 24, 2010 at 12:53 AM, Jesse McConnell
> wrote:
>> try LexicalUUIDType, that will distinguish the secs correctly
>>
>> imo based on the existing impl (last I checked at least) TimeUUIDType
>> was equivalent to LongType
>
> It u
On Sat, Apr 24, 2010 at 6:27 AM, Carlos Sanchez
wrote:
> There are forEach methods in that would allow you to travel the
> keys/values/entries w/o creating the extra object (entries)
Ok. So if change was made, it'd make sense to ensure those were used
for traversal. Thanks!
-+ Tatu +-
On Fri, Apr 23, 2010 at 1:22 PM, Carlos Sanchez
wrote:
> I will try to modify the code... what I like about Trove is that even for
> regular maps (non primitive) there are no Entry objects created so there are
> much less references to be gced
This could help, but how is iteration then handled?
On Mon, Apr 19, 2010 at 7:12 PM, Brandon Williams wrote:
> On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang wrote:
>>
>> 2. Reject the request when be short of resource, instead of throws OOME
>> and exit (crash).
>
> Right, that is the crux of the problem It will be addressed here:
> https://iss
On Fri, Apr 16, 2010 at 4:08 AM, Mark Robson wrote:
> On 15 April 2010 02:42, Zhuguo Shi wrote:
>>
>> Hi,
>> Cassandra has a good distributed model: decentralized, auto-partition,
>> auto-recovery. I am evaluating about writing a file system over Cassandra
>> (like CassFS: http://github.com/jdarc
On Fri, Apr 16, 2010 at 9:17 AM, Mike Gallamore
wrote:
> On 04/16/2010 01:38 AM, dir dir wrote:
>
> I hear Facebook.com and tweeter.com using cassandra database. In my opinion
> Facebook and
> tweeter have hundreds TB data. because their user reach hundreds million
> people.
>
> I think you might
On Wed, Apr 14, 2010 at 7:26 PM, Avinash Lakshman
wrote:
> OPP is not required here. You would be better off using a Random partitioner
> because you want to get a random distribution of the metadata.
Not for splitting, but for actual file system hierarchy it would. How
else would you traverse hi
On Wed, Apr 14, 2010 at 6:42 PM, Zhuguo Shi wrote:
> Hi,
> Cassandra has a good distributed model: decentralized, auto-partition,
> auto-recovery. I am evaluating about writing a file system over Cassandra
> (like CassFS: http://github.com/jdarcy/CassFS ), but I don't know if
> Cassandra is good a
On Mon, Apr 12, 2010 at 3:34 PM, Olexiy Prokhorenko
wrote:
> Hello,
>
> Asked this question on Stack Oveflow
> (http://stackoverflow.com/questions/2619744/searches-and-general-querying-with-hbase-and-or-cassandra-best-practices)
> but didn't get much of answers. May be some Cassandra people can he
On Wed, Apr 7, 2010 at 1:51 PM, Eric Evans wrote:
> On Tue, 2010-04-06 at 10:55 -0700, Tatu Saloranta wrote:
>> On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight
>> wrote:
>> > When import, all data in json file will load in memory. So that, you
>> can not
>> &
On Tue, Apr 6, 2010 at 2:12 PM, Steve wrote:
...
> Should I assume that it isn't common practice to write updates
> atomically in-real time, and batch process them 'off-line' to increase
> the atomic granularity? It seems an obvious strategy... possibly one
> for which an implementation might use
On Tue, Apr 6, 2010 at 8:45 AM, Mike Malone wrote:
>> As long as the conflict resolver knows that two writers each tried to
>> increment, then it can increment twice. The conflict resolver must know
>> about the semantics of "increment" or "decrement" or "string append" or
>> "binary patch" or wha
On Tue, Apr 6, 2010 at 8:17 AM, Jonathan Ellis wrote:
> On Tue, Apr 6, 2010 at 2:13 AM, Ilya Maykov wrote:
>> That does sound similar. It's possible that the difference I'm seeing
>> between ConsistencyLevel.ZERO and ConsistencyLevel.ALL is simply due
>> to the fact that using ALL slows down the
On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight wrote:
> When import, all data in json file will load in memory. So that, you can not
> import large data.
> You need to export large sstable file to many small json files, and run
> import.
Why would you ever read the whole file in memory? JSON is
On Tue, Apr 6, 2010 at 10:12 AM, Steve wrote:
> On 06/04/2010 15:26, Eric Evans wrote:
...
> I've read all about QUORUM, and it is generally useful, but as far as I
> can tell, it can't give me a transaction...
Correct. Only individual operations are atomic, and ordering of
insertions is not guar
On Tue, Apr 6, 2010 at 8:06 AM, Shuge Lee wrote:
>> 'girls': pickle.dumps(['java', 'actionscript', 'python'])
>
> I think this is a really bad idea, I can't do any search if using Pickle.
Just to be sure: are you thinking of traditional queries, lookups by
values (find entries that have certa
On Mon, Apr 5, 2010 at 5:10 PM, Paul Prescod wrote:
> On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta wrote:
>> ...
>>
>> I would think that there is also possibility of losing some
>> increments, or perhaps getting duplicate increments?
>
> I believe that with v
On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod wrote:
> On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone wrote:
>>> That's useful information Mike. I am a bit curious about what the most
>>> common use cases are for atomic increment/decrement. I'm familiar with
>>> atomic add as a sort of locking mechan
On Thu, Apr 1, 2010 at 9:43 PM, Jeremy Davis
wrote:
>
> You are correct, it is not a queue in the classic sense... I'm storing the
> entire "conversation" with a client in perpetuity, and then playing it back
> in the order received.
>
> Rabbitmq/activemq etc all have about the same throughput 3-6
On Thu, Apr 1, 2010 at 8:27 AM, Rao Venugopal wrote:
> To Cao Jiguang
>
> I was watching this presentation on bigtable yesterday
> http://video.google.com/videoplay?docid=7278544055668715642#
>
> and Jeff mentioned that they compared three different compression libraries
> BMDiff, LZO and gzip.
On Mon, Mar 29, 2010 at 10:31 AM, Matthew Stump wrote:
> Am I crazy to want to switch our server's primary data store from postgres to
> cassandra? This is a system used by banks and governments to store crypto
> keys which absolutely can not be lost.
Back to original question: in my completel
On Mon, Mar 29, 2010 at 5:57 PM, Jonathan Ellis wrote:
> Does http://wiki.apache.org/cassandra/FAQ#range_ghosts help?
Thank you for quick answer, and apologies for missing this entry.
So if I understand entry correctly, answer is yes, they need to be
explicitly handled by Cassandra.
Which means
Quick question: Cassandra documentation explains implementation of
deletes (using tombstones) quite well.
But what I was not quite sure about was what actual effects of
existing tombstones might have on doing range queries that would
include those tombstones.
That is: for a use case where new entri
On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert wrote:
> So, what does "anti-entropy repair" do then?
Fix discrepancies between live nodes? (caused by transient failures presumably)
> Sounds like you have to 'decommission' the dead node, then I thought run
> 'nodeprobe repair' to get the data adj
On Thu, Mar 25, 2010 at 9:20 AM, Benjamin Black wrote:
> Cassandra is not being used to generate the Twitter identifiers.
> Twitter, like most places using Cassandra, has more than one database
> system in production.
>
> UUIDs are not at risk of conflicts with billions of rows.
Exactly: UUIDs we
On Wed, Mar 24, 2010 at 8:45 AM, Ran Tavory wrote:
> I concur with Eric, as hector developer it's easier to develop separately
> (github) plus competition keeps us healthy ;)
Enthusiastic +1 for this :)
(both for proper layering to allow different levels of abstraction,
and for goodness of some c
On Fri, Mar 19, 2010 at 11:25 AM, Stu Hood wrote:
> All write patterns should provide the same performance with Cassandra, since
> all writes to disk occur sequentially.
Ok that makes sense.
> The only variance might be in the data structure used for the Memtable (a
> concurrent skip list), bu
On Fri, Mar 19, 2010 at 10:56 AM, Jonathan Ellis wrote:
> On Fri, Mar 19, 2010 at 12:52 PM, Tatu Saloranta wrote:
>> One sort of related question: given that order of insertions has huge
>> effects on some stores, like BDB (where inserting in key order is 10x
>> faster
On Fri, Mar 19, 2010 at 7:40 AM, Marcin wrote:
> Hi guys,
>
> is there a way to avoid compacting, flushing and all of this thing on
> startup and perform it while node is running ?
>
> It takes a lot of on startup.
One sort of related question: given that order of insertions has huge
effects on s
On Thu, Mar 18, 2010 at 7:31 AM, Vick Khera wrote:
> On Thu, Mar 18, 2010 at 9:15 AM, Bill Au wrote:
>> In theory there is a breaking point somewhere, right?
>
> I don't think google has hit it yet, so I'd have to say nobody has
> reached "the breaking point" yet
>
> What do the big places do
46 matches
Mail list logo