Based on our measurements, Lucene indexing is so CPU intensive that it wouldn’t 
really help much to exploit data locality on read. The overwhelming bottleneck 
remains the same. Having said that, we have an ingestion tool in the works that 
will take advantage of data locality for splitable files as well.

Wolfgang.

On Sep 24, 2014, at 9:38 AM, Tom Chen <tomchen1...@gmail.com> wrote:

> Hi,
> 
> The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> mapper. The mapper doesn't co-locate with the input data that it process.
> Isn't this a performance hit?
> 
> Ideally, morphline mapper should be run on those hosts that contain most
> data blocks for the input files it process.
> 
> Regards,
> Tom

Reply via email to