You're looking at a class that is specific to the
MapReduceIndexerTool which uses EmbeddedSolrServer
to build sub-indexes. These sub-indexes are then
merged via the tool into indexes suitable for
copying to (or merging with) existing Solr indexes
on a shard-by-shard basis.

If you're using MapReduce to process the files
and trying to index them to a live Solr cluster,
just use the regular SolrJ CloudSolrClient,
assemble lists of SolrInputDocuments and send
them to Solr with CloudSolrClient.add(doclist).

I usually use batches of 1,000 for small documents.

MRIT is intended for initial bulk loading. Last I knew,
for instance, it does not replace documents with the
same document ID (<uniqueKey>) since it relies on
merging the index using Lucene's merge capability
which does not check for duplicate doc IDs.

Best,
Erick

On Mon, Jul 18, 2016 at 1:15 AM, rashi gandhi <gandhirash...@gmail.com> wrote:
> Hi All,
>
>
>
> I am using Solr-5.0.0 API for indexing data in our application and the
> requirement is to index the data in batches, using solr-mapreduce API.
>
>
>
> In our application, we may receive data from any type of input source for
> example: file, streams and any other relational or non-relational Db’s in a
> particular format. And I need to index this data into Solr, by using
> SolrOutputFormat class.
>
>
>
> As per my analysis until now, I find that SolrOutputFormat works with the
> EmbeddedSolrServer and requires path to config files for indexing data,
> without the need of passing host and port for creating the SolrClient.
>
>
>
> I checked for the documentation online, but couldn’t find any proper
> examples that make the use of SolrOutputFormat class.
>
> Does anybody have some implementations or a document, which mentions
> details like what exactly needs to be passed as input to SolrOutputFormat
> configuration, etc.?
>
>
>
> Any pointers would be helpful.

Reply via email to