Hi,

I need to index into a new schema 800M docs, that exist in an older solr.
As all fields are stored, I thought I was very lucky as I could:

- use wt=csv
- combined with cursorMark

to easily script out something that would export/index in chunks of 1M docs
or something. CVS output being very efficient for this sort of thing, I
think.

But, sadly I found that there is no way to get the nextcursorMark after the
first request, as the csvwriter just outputs plailn csv info of the fields,
excluding all other info on the response!!!

This is so unfortunate, as csv/cursorMark seem like the perfect fit to
reindex this huge index (it's a one time thing).

Does anyone see some way to still be able to use this? I would prefer not
having to write some java code just to get the nextcursorMark.

So far I thought of:
- use json, but I need to postprocess returned json to remove the response
info etc, before reindexing, a pain.
- send two calls for each chunk (sending the same cursormark both times),
one wt=csv to get the data, another wt=json to get cursormark (and ignore
the data, maybe using fl=id only to avoid getting much data). I did some
test and this seems should work.

I guess I will go with the 2nd, but anyone has a better idea?
thanks
xavier

Reply via email to