SolrEntityProcessor is fine for small amounts of data but not useful for
such a large index. The problem is that deep paging in search results is
expensive. As the "start" value for a query increases so does the cost of
the query. You are much better off just re-indexing the data.


On Mon, Jun 10, 2013 at 11:19 PM, Mingfeng Yang <mfy...@wisewindow.com>wrote:

> I trying to migrate 100M documents from a solr index (v3.6) to a solrcloud
> index (v4.1, 4 shards) by using SolrEntityProcessor.  My data-config.xml is
> like
>
> <dataConfig> <document> <entity name="sep" processor="SolrEntityProcessor"
> url="http://10.64.35.117:8995/solr/"; query="*:*" rows="2000" fl=
>
> "author_class,authorlink,author_location_text,author_text,author,category,date,dimension,entity,id,language,md5_text,op_dimension,opinion_text,query_id,search_source,sentiment,source_domain_text,source_domain,text,textshingle,title,topic,topic_text,url"
> /> </document> </dataConfig>
>
> Initially, the data import rate is about 1K docs/second, but it eventually
> decrease to 20docs/second after running for tens of hours.
>
> Last time I tried data import with solorentityprocessor, the transfer rate
> can be as high as 3K docs/seconds.
>
> Anyone has any clues what can cause the slowdown?
>
> Thanks,
> Ming-
>



-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to