Re: Merging multiple Solr Indexes

J.J. Larrea Mon, 08 Oct 2007 08:53:48 -0700

At 9:51 PM -0700 10/7/07, Chris Hostetter wrote:
>: Thanks for the pointer. After two silent days waiting for reply,
>: I decided to  implement a command line for that. Works like a charm !!!
>
>well, sometimes people just don't post because they don't know the
>answer to something (better then 50 people posting "i don't know").


And then there's the case of people who intend to respond and sometimes even 
take the time to start on a response, but then get sidetracked by other things, 
and by the time they get around to hitting "send" the point has already been 
mooted by other posters... there's so much useful and quick commentary on 
solr-user one has to be quick!

Thus in light of your having given up and written your own merger, and Hoss' 
mention of the built-in org.apache.lucene.misc.IndexMergeTool (which I wish I'd 
known about), I didn't send the following, which I'm only sending now in case 
someone else might find BeanShell a useful tool for rapid prototyping, and with 
apologies for cluttering the list with something which is technically now 
off-topic.

- J.J.

>At 1:23 PM +0200 10/6/07, Ycrux wrote:
>>Is there a simple way (or command line tool)
>>to merge different Solr indexes (located on different machines)
>>into one ?
>
>Perhaps someone else can think of a way to do this that capitalizes on 
>Solr-specific features, but it can certainly be done using standard Lucene 
>calls if the remote index is accessible as a filesystem mount, e.g. via NFS.
>
>I have a lot of little Lucene helper scripts written in BeanShell (available 
>and documented at http://www.beanshell.org/) to save the bother of figuring 
>out where to put the classes and get them in my classpath; while BSH is 
>interpreted, since all of the work is done in Lucene code there's no 
>performance issue.  All you need is a bsh.jar and a lucene.jar in the 
>classpath.
>
>----- merge.bsh -----
>
>#!/usr/bin/java bsh.Interpreter
>
>import org.apache.lucene.index.IndexReader;
>import org.apache.lucene.index.IndexWriter;
>
>
>if( bsh.args.length < 2 ) {
>        print( "Usage: Merge [-create] <dest-index> <src-index> [ <src-index2> 
> ... ]" );
>        return(-1);
>}
>
>int argnum = 0;
>
>boolean create = false;
>
>if( "-create".equals(bsh.args[argnum]) ) {
>    create = true; ++argnum;
>}
>
>String dstName = bsh.args[argnum++];
>
>java.util.ArrayList readerList = new java.util.ArrayList();
>
>while( argnum < bsh.args.length ) {
>    String srcName=bsh.args[argnum++];
>    IndexReader reader = IndexReader.open(srcName);
>    print( srcName + ":\t" + reader.numDocs() + " documents");
>    readerList.add( reader );
>}
>
>IndexReader[] readerArray = new IndexReader[ readerList.size() ];
>for( int i = 0; i < readerArray.length; i++ )
>        readerArray[i] = (IndexReader)readerList.get(i);
>readerList = null;
>
>IndexWriter writer = null;
>
>try {
>        print( (create ? "Creating" : "Opening") + dstName + " for merge");
>        writer = new IndexWriter(dstName, new StandardAnalyzer(), create);
>        if( readerArray.length > 0 ) {
>                t0 = System.currentTimeMillis();
>                c0 = writer.docCount();
>                print( dstName + ":\t" + c0 + " documents");
>                writer.addIndexes( readerArray );
>                t1 = System.currentTimeMillis();
>                c1 = writer.docCount();
>                print( "Index " + dstName + " went from " + c0 + " to " + c1 + 
> " (" + (c1 - c0) + ") documents in "
>                        + (t1 - t0)/1000.0 + "sec" + " (e.g. " + ((t1 - t0) / 
> ((c1 - c0)*1.0)) + " millisec each" );
>        }
>}
>catch( Exception ex ) {
>        ex.printStackTrace();
>}
>finally {
>        if( writer != null )
>                writer.close();
>        for( int i = 0; i < readerArray.length; i++ )
>            if( readerArray[i] != null )
>                        readerArray[i].close();
>}
>
>----- /merge.bsh -----
>
>Also note that it is often much faster to merge n indexes into a new empty 
>index (e.g. with the -create option) than to merge n-1 indexes into an 
>existing index, due to to the pre- and post-optimizations that addIndexes does.
>
>- J.J.

Re: Merging multiple Solr Indexes

Reply via email to