This is actually something I will do quite frequently. I basically export
from Solr into a CSV file as part of a workflow sequence.

CSV is nice and fast, but does not have the ZooKeeper integration that I
like with SolrJ.


On Mon, May 6, 2013 at 10:11 AM, Shawn Heisey <s...@elyograg.org> wrote:

> On 5/6/2013 10:48 AM, Kevin Osborn wrote:
>
>> I am looking to export a large amount of data from Solr. This export will
>> be done by a Java application and then written to file. Initially, I was
>> thinking of using direct HTTP calls and using the CSV response writer. And
>> then my Java application can quickly parse each line from a stream.
>>
>> But, with SolrCloud, I prefer to use SolrJ due to its communication with
>> Zookeeper. Is there any way to use the CSV response writer with SolrJ?
>>
>> Would the overhead of using SolrJ's "solrbin" format be much slower than
>> the CSV response writer?
>>
>
> What do you intend to do with the exported data?  If you're going to use
> it to import into a new Solr index, you might be better off using the
> dataimport handler with SolrEntityProcessor.  Just point it at one of your
> servers and include the collection name in the URL.
>
> If the export will have other uses and CSV format will work for you, that
> would probably be more efficient than something you could whip together
> quickly with SolrJ.  If you've got really excellent java skills and have a
> lot of time to work on it, you might be able to write something efficient,
> but Solr can already do it.
>
> If you plan to page through your data rather than grab it all with one
> query, it is MUCH more efficient to use a range query on a field with
> sequential data than to use the start and rows parameters.  This is
> *especially* true if you're using a sharded index, which is typically the
> case with SolrCloud.
>
> By the way, I am assuming that this process will be a one-time (or very
> rare) thing for migration purposes, or possibly something that you
> occasionally do for some kind of index verification.  If this is something
> that you'll be doing all the time, then you probably want to develop a
> SolrJ application.
>
> Thanks,
> Shawn
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677      SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Reply via email to