Re: Full import alternatives

Erick Erickson Fri, 13 Apr 2018 09:45:12 -0700

_how_ are you importing? DIH? SolrJ?

Here's an article about using SolrJ
https://lucidworks.com/2012/02/14/indexing-with-solrj/

But without more details it's really impossible to say much. Things
I've done in the past:
1> use SolrJ and partition the job up amongst a bunch of clients each
of which works on a subset of docs. This requires, of course, that
there's a way to partition the import.
2> For joins and the like, I've sometimes been able to cache data in
local storage (SolrJ) and use that rather than using the joins. May
not be possible of course depending on the size of some of your
tables.
3> with DIH, there are some caching capabilities although I confess I
don't know the pros and cons.
4> Work with your DB administrator to tune your query. Sometimes this
means creating a view, sometimes adding indexes sometimes.....

Best,
Erick

On Fri, Apr 13, 2018 at 9:11 AM, Jesus Olivan <jesus.oli...@letgo.com> wrote:
> Hi!
>
> we're trying to launch a full import of 375 millions of docs aprox. from a
> MySQL database to our solrcloud cluster. Until now, this full import
> process takes around 24/27 hours to finish due to an huge import query
> (several group bys, left joins, etc), but after another import query
> modification (adding more complexity), we're unable to execute this full
> import from MySQL.
>
> We've done some research about migrating to PostgreSQL, but this option is
> now a real option at this time, because it implies a big refatoring from
> several dev teams.
>
> Is there some alternative ways to perform successfully this full import
> process?
>
> Any ideas are welcome :)
>
> Thanks in advance!

Re: Full import alternatives

Reply via email to