Note two things:

1> this is running on Hadoop
2> it is part of the standard Solr release as MapReduceIndexerTool,
look in the contribs...

If you're trying to do this yourself, you must be very careful to index docs
to the correct shard then merge the correct shards. MRIT does this all
automatically.

Additionally, it has the cool feature that if (and only if) your Solr
index is running over
HDFS, the --go-live option will automatically merge the indexes into
the appropriate
running Solr instances.

One caveat. This tool doesn't handle _updating_ documents. So if you
run it twice
on the same data set, you'll have two copies of every doc. It's
designed as a bulk
initial-load tool.

Best,
Erick



On Thu, Nov 19, 2015 at 11:45 AM, KNitin <nitin.t...@gmail.com> wrote:
> Great. Thanks!
>
> On Thu, Nov 19, 2015 at 11:24 AM, Sameer Maggon <sam...@measuredsearch.com>
> wrote:
>
>> If you are trying to create a large index and want speedups there, you
>> could use the MapReduceTool -
>> https://github.com/cloudera/search/tree/cdh5-1.0.0_5.2.1/search-mr. At a
>> high level, it takes your files (csv, json, etc) as input can create either
>> a single or a sharded index that you can either copy it to your Solr
>> Servers. I've used this to create indexes that include hundreds of millions
>> of documents in fairly decent amount of time.
>>
>> Thanks,
>> --
>> *Sameer Maggon*
>> Measured Search
>> www.measuredsearch.com <http://measuredsearch.com/>
>>
>> On Thu, Nov 19, 2015 at 11:17 AM, KNitin <nitin.t...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> >  I was wondering if there are existing tools that will generate solr
>> index
>> > offline (in solrcloud mode)  that can be later on loaded into solrcloud,
>> > before I decide to implement my own. I found some tools that do only solr
>> > based index loading (non-zk mode). Is there one with zk mode enabled?
>> >
>> >
>> > Thanks in advance!
>> > Nitin
>> >
>>

Reply via email to