That was the idea :) Thanks for the update On Friday, 19 August 2016, Madhav Sharan <[email protected]> wrote:
> Thanks for your suggestion Daniel. I was already using SequenceFile but my > format was poor. I was storing file contents as Text in my SeqFile, > > So all my map jobs did repeated conversion from Text to double. I resolved > this by correcting SequenceFile format. Now I store serialised java object > in SeqFile and my map jobs are faster. > > -- > Madhav Sharan > > > On Wed, Aug 17, 2016 at 11:07 PM, Daniel Haviv <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Store them within a sequencefile >> >> >> On Thursday, 18 August 2016, Madhav Sharan <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> Hi , can someone please recommend a fast way in hadoop to store and >>> retrieve matrix of double values? >>> >>> As of now we store values in text files and the read it in java using >>> HDFS inputstream and Scanner. *[0]* These files are actually vectors >>> representing a video file. Each vector is 883 X 200 and for one map job we >>> read 4 such vectors so *job is to convert 706,400 values to double*. >>> >>> Using this approach we take ~ 1.5 second to convert all these values. I >>> can use a external cache server to avoid repeated conversion but I am >>> looking for a better solution. >>> >>> [0] - https://github.com/USCDataScience/hadoop-pot/blob/master/s >>> rc/main/java/org/pooledtimeseries/PoT.java#L596 >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_USCDataScience_hadoop-2Dpot_blob_master_src_main_java_org_pooledtimeseries_PoT.java-23L596&d=DQMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=6105jJkHPEbDi_yojUYcLP3vpvkzg0AV-r1MdgyCG1g&s=PNNdBOT8PCJ4RFaHzF9EYPJaDfjlLKJfyvlIobonBxA&e=> >>> >>> >>> -- >>> Madhav Sharan >>> >>> >
