Pardon me, what exactly IS Big Data :)

ya smiley, but I think it's worth trying to put words to it.

mostly, I think BD is really "Many Data": it's not really
about the absolute scale. If I run a big simulation that writes 10 TB of checkpoints every cycle, that's reasonably
large data.  in a sense, I've got just one unit of data per node,
so not really "many".  Or if I'm doing lookups in some giant
business DB - the tables may be quite large, but I'm probably
doing low-cardinality selects and joins (indices FTW!).

in a sense, you have BG when your data and performance controls
the design of your clusters.  you may have a very trad DB that's
implemented across more than one node, but it's probably not
a thousand nodes with gigabit - the latter is probably BD.

I often think of BD and Data Mining as being quite closely linked.
But I don't think I'd want to say that all BD is for DM...

regards, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to