Am 03.01.2015 um 07:07 schrieb Srinivasa T N:
> Hi Wilm,
>The reason is that for some auditing purpose, I want to store the
> original files also.
well, then I would use a hdfs cluster for storing, as it seems to be
exactly what you need. If you collocate hdfs DataNodes and yarns
ResourceManage
If it's for auditing, if recommend pushing the files out somewhere reasonably
external, Amazon S3 works well for this type of thing, and you don't have to
worry too much about backups and the like.
__
Sent from iPhone
> On 3 Jan 2015, at 5:07 pm, Srinivasa T N wrote
Hi Wilm,
The reason is that for some auditing purpose, I want to store the
original files also.
Regards,
Seenu.
On Fri, Jan 2, 2015 at 11:09 PM, Wilm Schumacher
wrote:
> Hi,
>
> perhaps I totally misunderstood your problem, but why "bother" with
> cassandra for storing in the first place?
>
Hi,
perhaps I totally misunderstood your problem, but why "bother" with
cassandra for storing in the first place?
If your MR for hadoop is only run once for each file (as you wrote
above), why not copy the data directly to hdfs, run your MR job and use
cassandra as sink?
As hdfs and yarn are mor
> Since the hadoop MR streaming job requires the file to be processed to be
> present in HDFS,
> I was thinking whether can it get directly from mongodb instead of me
> manually fetching it
> and placing it in a directory before submitting the hadoop job?
Hadoop M/R can get data directly from
I agree that cassandra is a columnar store. The storing of the raw xml
file, parsing the file using hadoop and then storing the extracted value is
only once. The extracted data on which further operations will be done
suits well with the timeseries storage of the data provided by cassandra
and th
> Can this split and combine be done automatically by cassandra when
inserting/fetching the file without application being bothered about it?
There are client libraries which offer recipes for this, but in general,
no.
You're trying to do something with Cassandra that it's not designed to do.
You
On Fri, Jan 2, 2015 at 5:54 PM, mck wrote:
>
> You could manually chunk them down to 64Mb pieces.
>
> Can this split and combine be done automatically by cassandra when
inserting/fetching the file without application being bothered about it?
>
> > 2) Can I replace HDFS with Cassandra so that I
> 1) The FAQ … informs that I can have only files of around 64 MB …
See http://wiki.apache.org/cassandra/CassandraLimitations
"A single column value may not be larger than 2GB; in practice, "single
digits of MB" is a more reasonable limit, since there is no streaming
or random access of blob va
Hi All,
The problem I am trying to address is: Store the raw files (files are
in xml format and of the size arnd 700MB) in cassandra, later fetch it and
process it in hadoop cluster and populate back the processed data in
cassandra. Regarding this, I wanted few clarifications:
1) The FAQ (
ht
10 matches
Mail list logo