SolrEntityProcessor is fine for small amounts of data but not useful for such a large index. The problem is that deep paging in search results is expensive. As the "start" value for a query increases so does the cost of the query. You are much better off just re-indexing the data.
On Mon, Jun 10, 2013 at 11:19 PM, Mingfeng Yang <mfy...@wisewindow.com>wrote: > I trying to migrate 100M documents from a solr index (v3.6) to a solrcloud > index (v4.1, 4 shards) by using SolrEntityProcessor. My data-config.xml is > like > > <dataConfig> <document> <entity name="sep" processor="SolrEntityProcessor" > url="http://10.64.35.117:8995/solr/" query="*:*" rows="2000" fl= > > "author_class,authorlink,author_location_text,author_text,author,category,date,dimension,entity,id,language,md5_text,op_dimension,opinion_text,query_id,search_source,sentiment,source_domain_text,source_domain,text,textshingle,title,topic,topic_text,url" > /> </document> </dataConfig> > > Initially, the data import rate is about 1K docs/second, but it eventually > decrease to 20docs/second after running for tens of hours. > > Last time I tried data import with solorentityprocessor, the transfer rate > can be as high as 3K docs/seconds. > > Anyone has any clues what can cause the slowdown? > > Thanks, > Ming- > -- Regards, Shalin Shekhar Mangar.