Hey Shawn,

Unfortunately, we can't upgrade the existing cluster.  That was my first
approach as well.

Yes, SolrEntityProcessor is used so it results in deep paging after certain
rows.

I have observed that instead of importing for a larger period, if data is
imported only for 4 hours at a time, import process is much faster.  Since
we are importing for several months it would be nice if dataimport can be
scripted, in bash or python.  But I am can't find any documentation on it.
Any pointers?

------------------------------
*From:* Shawn Heisey <apa...@elyograg.org>
*Sent:* Thursday, April 27, 2017 5:07 PM
*To:* solr-user@lucene.apache.org
*Subject:* Re: DIH Speed

On 4/27/2017 5:40 PM, Erick Erickson wrote:
> I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep
paging is querying.
>
> If it's querying, consider cursorMark or the /export handler.
https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Very likely they are using SolrEntityProcessor.

Vijay, if the source server were running 4.7 (or later) instead of 4.5,
you could enable cursorMark for SolrEntityProcessor in 6.5.0 as Erick
mentioned, and pagination would be immensely more efficient.
Unfortunately, 4.5 doesn't support cursorMark.

https://issues.apache.org/jira/browse/SOLR-9668

Any chance you could upgrade the source server to a later 4.x version?

Thanks,
Shawn

Reply via email to