I have large volume of stream log data. Each data record contains a time stamp, which is very important to the analysis. For example, I have data format like this: (1) 20:30:21 01/April/2012 AAAAA............. (2) 20:30:51 01/April/2012 BBBB............. (3) 21:30:21 01/April/2012 bbbb.............
Moreover, new data comes every few minutes. I have to calculate the probability of the occurrence "bbbb" given the occurrence of "BBBB" (where BBBB occurs earlier than bbbb). So, it is really time-dependant. I wonder if Hadoop is the right platform for this job? Is there any package available for this kind of work? Thank you. Zhiwei
