Re: Storage question

2013-03-05 Thread aaron morton
he >> mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which >> can run more than one split per map. >> >> -Original Message- >> From: Hiller, Dean [mailto:dean.hil...@nrel.gov] >> Sent: 04 March 2013 13:38 >> To: user@cassandra.

Re: Storage question

2013-03-04 Thread Hiller, Dean
overhead: task JVM reuse for running multiple map tasks in one JVM, >thereby avoiding some JVM startup overhead (see the >mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which >can run more than one split per map. > >-Original Message- >From: Hiller, Dean

RE: Storage question

2013-03-04 Thread Kanwar Sangha
an run more than one split per map. -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: 04 March 2013 13:38 To: user@cassandra.apache.org Subject: Re: Storage question Well, astyanax I know can simulate streaming into cassandra and disperses the file to multiple rows i

Re: Storage question

2013-03-04 Thread Rustam Aliyev
Each storage system has its own purpose. While Cassandra would be good for metadata, depending on the size of objects Cassandra could be not the best fit. You need something more like Amazon S3 for blob storage. Try Ceph RADOS or OpenStack Object Store which both provide S3 compatible API. On

Re: Storage question

2013-03-04 Thread Hiller, Dean
Well, astyanax I know can simulate streaming into cassandra and disperses the file to multiple rows in the cluster so you could check that out. Out of curiosity, why is HDFS not good for a small file size? For reading, it should be the bomb with RF=3 since you can read from multiple nodes and s

Re: Storage question

2013-03-04 Thread Michael Kjellman
The best way would be to chunk your binary blobs into 1/2MB chunks. You could store Key (md5 of entire blob) => part1, part2, part3 etc BytesType Validation Then if you want the entire image just grab the key (md5)..obviously you'll need a index somewhere with a filename => md5 Best, Michael