Re: MRIT's morphline mapper doesn't co-locate with data

2014-09-25 Thread Tom Chen
Do you have the solr Jira number for the new ingestion tool? Thanks On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek wrote: > Based on our measurements, Lucene indexing is so CPU intensive that it > wouldn’t really help much to exploit data locality on read. The > overwhelming bottleneck remai

Re: MRIT's morphline mapper doesn't co-locate with data

2014-09-24 Thread Wolfgang Hoschek
Based on our measurements, Lucene indexing is so CPU intensive that it wouldn’t really help much to exploit data locality on read. The overwhelming bottleneck remains the same. Having said that, we have an ingestion tool in the works that will take advantage of data locality for splitable files

MRIT's morphline mapper doesn't co-locate with data

2014-09-24 Thread Tom Chen
Hi, The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline mapper. The mapper doesn't co-locate with the input data that it process. Isn't this a performance hit? Ideally, morphline mapper should be run on those hosts that contain most data blocks for the input files it process.