Hi All, I am having a solr cloud cluster of 20 nodes with each node having close to 20 Million records and total index size is around 400GB ( 20GB per node X 20 nodes ). I am trying to know the best way to dump out the entire solr data in say CSV format.
I use successive queries by incrementing the start param with 2000 and keeping the rows as 2000 and hitting each individual servers using distrib=false so that I don't overload the top level server and causing any timeouts between top level and lower level servers. I am getting response from solr very quickly when the start param is in lower millions < 2 millions. As the start param grows towards 16 million, solr takes almost 2 to 3 minutes to return back those 2000 records for a single query. I assume this is because of skipping all the lower level index positions to get to that start index of > 16 millions and then provide the results. Is there any better way to do this? I saw cursor feature in solr pagination Wiki but it is mentioned that it is for sort on a unique field. Would it make sense for my use this to sort on my solr key field(Solr unique key field) with rows as 2000 and keep on using the nextCursorMark to dump out all the documents in csv format? Thanks, Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html Sent from the Solr - User mailing list archive at Nabble.com.