Hi All,

I am having a solr cloud cluster of 20 nodes with each node having close to
20 Million records and total index size is around 400GB ( 20GB per node X 20
nodes ). I am trying to know the best way to dump out the entire solr data
in say CSV format. 

I use successive queries by incrementing the start param with 2000 and
keeping the rows as 2000 and hitting each individual servers using
distrib=false so that I don't overload the top level server and causing any
timeouts between top level and lower level servers. I am getting response
from solr very quickly when the start param is in lower millions < 2
millions. As the start param grows towards 16 million, solr takes almost 2
to 3 minutes to return back those 2000 records for a single query. I assume
this is because of skipping all the lower level index positions to get to
that start index of > 16 millions and then provide the results.

Is there any better way to do this? I saw cursor feature in solr pagination
Wiki but it is mentioned that it is for sort on a unique field. Would it
make sense for my use this to sort on my solr key field(Solr unique key
field) with rows as 2000 and keep on using the nextCursorMark to dump out
all the documents in csv format?

Thanks,
Sriram




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to