I've implemented something like described in https://issues.apache.org/jira/browse/SOLR-3246. The idea is to add an update request processor at the end of the update chain in the core you want to copy. The processor converts the SolrInputDocument to XML (there is some utility method for doing this) and dumps the XML into a file which can be fed into Solr again with curl. If you have many documents you will probably want to distribute the XML files into different directories using some common prefix in the id field.
On Fri, Apr 6, 2012 at 5:18 AM, Ahmet Arslan <iori...@yahoo.com> wrote: >> I am considering writing a small tool that would read from >> one solr core >> and write to another as a means of quick re-indexing of >> data. I have a >> large-ish set (hundreds of thousands) of documents that I've >> already parsed >> with Tika and I keep changing bits and pieces in schema and >> config to try >> new things often. Instead of having to go through the >> process of >> re-indexing from docs (and some DBs), I thought it may be >> much more faster >> to just read from one core and write into new core with new >> schema, analysers and/or settings. >> >> I was wondering if anyone else has done anything similar >> already? It would >> be handy if I can use this sort of thing to spin off another >> core write to >> it and then swap the two cores discarding the older one. > > You might find these relevant : > > https://issues.apache.org/jira/browse/SOLR-3246 > > http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor > >