You're looking at a class that is specific to the MapReduceIndexerTool which uses EmbeddedSolrServer to build sub-indexes. These sub-indexes are then merged via the tool into indexes suitable for copying to (or merging with) existing Solr indexes on a shard-by-shard basis.
If you're using MapReduce to process the files and trying to index them to a live Solr cluster, just use the regular SolrJ CloudSolrClient, assemble lists of SolrInputDocuments and send them to Solr with CloudSolrClient.add(doclist). I usually use batches of 1,000 for small documents. MRIT is intended for initial bulk loading. Last I knew, for instance, it does not replace documents with the same document ID (<uniqueKey>) since it relies on merging the index using Lucene's merge capability which does not check for duplicate doc IDs. Best, Erick On Mon, Jul 18, 2016 at 1:15 AM, rashi gandhi <gandhirash...@gmail.com> wrote: > Hi All, > > > > I am using Solr-5.0.0 API for indexing data in our application and the > requirement is to index the data in batches, using solr-mapreduce API. > > > > In our application, we may receive data from any type of input source for > example: file, streams and any other relational or non-relational Db’s in a > particular format. And I need to index this data into Solr, by using > SolrOutputFormat class. > > > > As per my analysis until now, I find that SolrOutputFormat works with the > EmbeddedSolrServer and requires path to config files for indexing data, > without the need of passing host and port for creating the SolrClient. > > > > I checked for the documentation online, but couldn’t find any proper > examples that make the use of SolrOutputFormat class. > > Does anybody have some implementations or a document, which mentions > details like what exactly needs to be passed as input to SolrOutputFormat > configuration, etc.? > > > > Any pointers would be helpful.