On 4/13/2018 10:11 AM, Jesus Olivan wrote:
> we're trying to launch a full import of 375 millions of docs aprox. from a
> MySQL database to our solrcloud cluster. Until now, this full import
> process takes around 24/27 hours to finish due to an huge import query
> (several group bys, left joins, etc), but after another import query
> modification (adding more complexity), we're unable to execute this full
> import from MySQL.
>
> We've done some research about migrating to PostgreSQL, but this option is
> now a real option at this time, because it implies a big refatoring from
> several dev teams.
>
> Is there some alternative ways to perform successfully this full import
> process?

DIH is a capable tool, and for what it does, it's remarkably efficient.

It can't really be made any faster, because it's single threaded.  To
get increased index speed with Solr, you must index documents from
several sources/processes/threads at the same time.  Writing custom
software that can retrieve information from your source, build the
documents you require, and send several update requests simultaneously
will yield the best results.  The source itself may be a bottleneck
though -- this is frequently the case, and Solr is often MUCH faster
than the information source.

You said that you're unable to execute an updated import from MySQL. 
What exactly happens when you try?  Are there any errors in your solr
logfile?

I'm not going to debate whether MySQL or PostgreSQL is the better
solution.  For my indexes, my source data is in MySQL.  It works well,
but full rebuilds using DIH are slower than I would like -- because it's
single-threaded.  Our overall system architecture would probably be
improved by a switch to PostgreSQL, but it would be an extremely
time-consuming transition process.  We aren't having any real issues
with MySQL, so we have no incentive to spend the required effort.

Thanks,
Shawn

Reply via email to