hi harsh, thank you for the quick response. we are currently running with cdh3u2.
i have run map-reduces in many forms on non-closed files: 1. streaming -mapper /bin/cat 2. run word count 3. run our own java job. output parts are always empty, the jobs ended successfully. running hadoop fs -cat on the same input return results. am i doing something wrong ? niv On Sun, Mar 4, 2012 at 6:49 PM, Harsh J <[email protected]> wrote: > Technically, yes, you can run MR jobs on non-closed files (It'll run > the reader in the same way as your -cat) , but your would only be able > to read until the last complete block, or until the point sync() was > called on the output stream. > > It is better if your file-writer uses the sync() API judiciously to > mark sync points after a considerable amount of records, so that your > MR readers in tasks read until whole records and not just block > boundaries. > > For a description on sync() API, read the section 'Coherency Model' in > Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68. > > On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <[email protected]> wrote: > > hi all, > > > > we are looking for a way, to map-reduce on a non-closed files. > > we currently able to run a > > hadoop fs -cat <non-closed-file> > > > > non-closed files - files that are currently been written, and have not > been > > closed yet. > > > > is there any way to run map-reduce a on non-closed files ?? > > > > > > 10x in advance for any answer > > -- > > Niv Mizrahi > > Taykey | www.taykey.com > > > > > > -- > Harsh J > -- *Niv Mizrahi* Taykey | www.taykey.com
