Look at how the older rsync-based snapshooter works: it uses the Unix rsync program to very efficiently spot and copy updated files in the master index. It runs from each query slave, just like Java replication. Unlike Java replication, it just uses the SSH copy protocol, and does not talk to the master indexing Solr program.
You can run the snapshooter against any directory with a Lucene index. An actively updated index will work great. The key to this replicator is that Lucene never saves inconsistent data on disk: it writes new data and the updates the master list of what is new data, then deletes the old data. You can copy a Lucene index at any point in time and it will be consistent. On Tue, Aug 7, 2012 at 9:25 AM, Robert Stewart <bstewart...@gmail.com> wrote: > Hi, > > I have a client who uses Lucene in a home grown CMS system they > developed in Java. They have a lot of code that uses the Lucene API > directly and they can't change it now. But they also need to use SOLR > for some other apps which must use the same Lucene index data. So I > need to make a good way to periodically replicate the Lucene index to > SOLR. I know how to make efficient Lucene index snapshots from within > their CMS Java app (basically using the same method as the old > replication scripts, using hard-links, etc.) - assuming I have a new > index snapshot, how can I tell a running SOLR instance to start using > the new index snapshot instead of its current index, and also how can > I configure SOLR to use the latest "snapshot" directory on re-start? > Assume I create new index snapshots into a directory such that each > new snapshot is a folder in format YYYYMMHHMMDDSS (timestamp). Is > there any way to configure SOLR to look someplace for new index > snapshots (some multi-core setup?). > > Thanks! -- Lance Norskog goks...@gmail.com