Well, it's cursor or nothing. Well, or some sort of custom code to manually read Lucene indexes (good luck with deleted items, etc).
I think your understanding is correct. Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 18:10, vsriram30 <vsrira...@gmail.com> wrote: > Hi All, > > I am having a solr cloud cluster of 20 nodes with each node having close to > 20 Million records and total index size is around 400GB ( 20GB per node X 20 > nodes ). I am trying to know the best way to dump out the entire solr data > in say CSV format. > > I use successive queries by incrementing the start param with 2000 and > keeping the rows as 2000 and hitting each individual servers using > distrib=false so that I don't overload the top level server and causing any > timeouts between top level and lower level servers. I am getting response > from solr very quickly when the start param is in lower millions < 2 > millions. As the start param grows towards 16 million, solr takes almost 2 > to 3 minutes to return back those 2000 records for a single query. I assume > this is because of skipping all the lower level index positions to get to > that start index of > 16 millions and then provide the results. > > Is there any better way to do this? I saw cursor feature in solr pagination > Wiki but it is mentioned that it is for sort on a unique field. Would it > make sense for my use this to sort on my solr key field(Solr unique key > field) with rows as 2000 and keep on using the nextCursorMark to dump out > all the documents in csv format? > > Thanks, > Sriram > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html > Sent from the Solr - User mailing list archive at Nabble.com.