Hey Shawn, Unfortunately, we can't upgrade the existing cluster. That was my first approach as well.
Yes, SolrEntityProcessor is used so it results in deep paging after certain rows. I have observed that instead of importing for a larger period, if data is imported only for 4 hours at a time, import process is much faster. Since we are importing for several months it would be nice if dataimport can be scripted, in bash or python. But I am can't find any documentation on it. Any pointers? ------------------------------ *From:* Shawn Heisey <apa...@elyograg.org> *Sent:* Thursday, April 27, 2017 5:07 PM *To:* solr-user@lucene.apache.org *Subject:* Re: DIH Speed On 4/27/2017 5:40 PM, Erick Erickson wrote: > I'm unclear why DIH an deep paging are mixed. DIH is indexing and deep paging is querying. > > If it's querying, consider cursorMark or the /export handler. https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/ Very likely they are using SolrEntityProcessor. Vijay, if the source server were running 4.7 (or later) instead of 4.5, you could enable cursorMark for SolrEntityProcessor in 6.5.0 as Erick mentioned, and pagination would be immensely more efficient. Unfortunately, 4.5 doesn't support cursorMark. https://issues.apache.org/jira/browse/SOLR-9668 Any chance you could upgrade the source server to a later 4.x version? Thanks, Shawn