I have large volume of stream log data. Each data record contains a time
stamp, which is very important to the analysis.
For example, I have data format like this:
(1) 20:30:21 01/April/2012    AAAAA.............
(2) 20:30:51 01/April/2012    BBBB.............
(3) 21:30:21 01/April/2012    bbbb.............

Moreover, new data comes every few minutes.
I have to calculate the probability of the occurrence "bbbb" given the
occurrence of "BBBB" (where BBBB occurs earlier than bbbb). So, it is
really time-dependant.

I wonder if Hadoop  is the right platform for this job? Is there any
package available for this kind of work?

Thank you.

Zhiwei

Reply via email to