On 3/23/2016 2:36 AM, fabigol wrote:
> i want to do indexing with api SolrJ. So, i believe the indexing will be
> multhreaded.
> But i have 5 root entites.

The config you included is from the dataimport handler.  This is *NOT*
indexing with SolrJ.  You can SolrJ to *start* the indexing, and with
enough development effort you can even monitor the import status, but
the indexing will be done by Solr itself with the dataimport handler.

The dataimport handler is single-threaded, and it would be a major
effort to change that.  There used to be a multi-threaded option, but it
was removed because it didn't work.

The DIH config that you included has one outer entity which encloses two
inner entities.  For every single row returned by the outer entity, an
individual SQL query will be sent on each of the inner entities.  You
have caching on the inner entities, which might be able to eliminate
some of those queries, but if those inner entities have a lot of rows,
then it will take massive amounts of memory for those caches.

If you want multi-threaded indexing in a Java program, you need to use
SolrJ and JDBC to write a program that talks to your database and then
uses multiple threads to send indexing requests to Solr.  I would
recommend that you avoid ConcurrentUpdateSolrClient, even though it can
easily solve the multi-threaded aspect for you, because if you use CUSC,
your program will not be able to tell you when indexing fails.

Thanks,
Shawn

Reply via email to