Store them within a sequencefile On Thursday, 18 August 2016, Madhav Sharan <[email protected]> wrote:
> Hi , can someone please recommend a fast way in hadoop to store and > retrieve matrix of double values? > > As of now we store values in text files and the read it in java using HDFS > inputstream and Scanner. *[0]* These files are actually vectors > representing a video file. Each vector is 883 X 200 and for one map job we > read 4 such vectors so *job is to convert 706,400 values to double*. > > Using this approach we take ~ 1.5 second to convert all these values. I > can use a external cache server to avoid repeated conversion but I am > looking for a better solution. > > [0] - https://github.com/USCDataScience/hadoop-pot/ > blob/master/src/main/java/org/pooledtimeseries/PoT.java#L596 > > -- > Madhav Sharan > >
