he
>> mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which
>> can run more than one split per map.
>>
>> -Original Message-
>> From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
>> Sent: 04 March 2013 13:38
>> To: user@cassandra.
overhead: task JVM reuse for running multiple map tasks in one JVM,
>thereby avoiding some JVM startup overhead (see the
>mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which
>can run more than one split per map.
>
>-Original Message-
>From: Hiller, Dean
an run more than one split per map.
-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
Sent: 04 March 2013 13:38
To: user@cassandra.apache.org
Subject: Re: Storage question
Well, astyanax I know can simulate streaming into cassandra and disperses the
file to multiple rows i
Each storage system has its own purpose. While Cassandra would be good
for metadata, depending on the size of objects Cassandra could be not
the best fit. You need something more like Amazon S3 for blob storage.
Try Ceph RADOS or OpenStack Object Store which both provide S3
compatible API.
On
Well, astyanax I know can simulate streaming into cassandra and disperses the
file to multiple rows in the cluster so you could check that out.
Out of curiosity, why is HDFS not good for a small file size? For reading, it
should be the bomb with RF=3 since you can read from multiple nodes and s
The best way would be to chunk your binary blobs into 1/2MB chunks.
You could store
Key (md5 of entire blob) => part1, part2, part3 etc
BytesType Validation
Then if you want the entire image just grab the key (md5)..obviously you'll
need a index somewhere with a filename => md5
Best,
Michael