Re: Best way to dump out entire solr content?

Alexandre Rafalovitch Thu, 12 Mar 2015 15:17:30 -0700

Well, it's cursor or nothing. Well, or some sort of custom code to
manually read Lucene indexes (good luck with deleted items, etc).


I think your understanding is correct.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 12 March 2015 at 18:10, vsriram30 <vsrira...@gmail.com> wrote:
> Hi All,
>
> I am having a solr cloud cluster of 20 nodes with each node having close to
> 20 Million records and total index size is around 400GB ( 20GB per node X 20
> nodes ). I am trying to know the best way to dump out the entire solr data
> in say CSV format.
>
> I use successive queries by incrementing the start param with 2000 and
> keeping the rows as 2000 and hitting each individual servers using
> distrib=false so that I don't overload the top level server and causing any
> timeouts between top level and lower level servers. I am getting response
> from solr very quickly when the start param is in lower millions < 2
> millions. As the start param grows towards 16 million, solr takes almost 2
> to 3 minutes to return back those 2000 records for a single query. I assume
> this is because of skipping all the lower level index positions to get to
> that start index of > 16 millions and then provide the results.
>
> Is there any better way to do this? I saw cursor feature in solr pagination
> Wiki but it is mentioned that it is for sort on a unique field. Would it
> make sense for my use this to sort on my solr key field(Solr unique key
> field) with rows as 2000 and keep on using the nextCursorMark to dump out
> all the documents in csv format?
>
> Thanks,
> Sriram
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-way-to-dump-out-entire-solr-content-tp4192734.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best way to dump out entire solr content?

Reply via email to