This is actually something I will do quite frequently. I basically export from Solr into a CSV file as part of a workflow sequence.
CSV is nice and fast, but does not have the ZooKeeper integration that I like with SolrJ. On Mon, May 6, 2013 at 10:11 AM, Shawn Heisey <s...@elyograg.org> wrote: > On 5/6/2013 10:48 AM, Kevin Osborn wrote: > >> I am looking to export a large amount of data from Solr. This export will >> be done by a Java application and then written to file. Initially, I was >> thinking of using direct HTTP calls and using the CSV response writer. And >> then my Java application can quickly parse each line from a stream. >> >> But, with SolrCloud, I prefer to use SolrJ due to its communication with >> Zookeeper. Is there any way to use the CSV response writer with SolrJ? >> >> Would the overhead of using SolrJ's "solrbin" format be much slower than >> the CSV response writer? >> > > What do you intend to do with the exported data? If you're going to use > it to import into a new Solr index, you might be better off using the > dataimport handler with SolrEntityProcessor. Just point it at one of your > servers and include the collection name in the URL. > > If the export will have other uses and CSV format will work for you, that > would probably be more efficient than something you could whip together > quickly with SolrJ. If you've got really excellent java skills and have a > lot of time to work on it, you might be able to write something efficient, > but Solr can already do it. > > If you plan to page through your data rather than grab it all with one > query, it is MUCH more efficient to use a range query on a field with > sequential data than to use the start and rows parameters. This is > *especially* true if you're using a sharded index, which is typically the > case with SolrCloud. > > By the way, I am assuming that this process will be a one-time (or very > rare) thing for migration purposes, or possibly something that you > occasionally do for some kind of index verification. If this is something > that you'll be doing all the time, then you probably want to develop a > SolrJ application. > > Thanks, > Shawn > > -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]