Re: Merging multiple Solr Indexes

Ycrux Mon, 08 Oct 2007 10:12:13 -0700

Seems good. Thanks

cheers
Y.


J.J. Larrea a écrit :

At 9:51 PM -0700 10/7/07, Chris Hostetter wrote:

: Thanks for the pointer. After two silent days waiting for reply,
: I decided to  implement a command line for that. Works like a charm !!!

well, sometimes people just don't post because they don't know the
answer to something (better then 50 people posting "i don't know").


And then there's the case of people who intend to respond and sometimes even take the 
time to start on a response, but then get sidetracked by other things, and by the time 
they get around to hitting "send" the point has already been mooted by other 
posters... there's so much useful and quick commentary on solr-user one has to be quick!

Thus in light of your having given up and written your own merger, and Hoss' 
mention of the built-in org.apache.lucene.misc.IndexMergeTool (which I wish I'd 
known about), I didn't send the following, which I'm only sending now in case 
someone else might find BeanShell a useful tool for rapid prototyping, and with 
apologies for cluttering the list with something which is technically now 
off-topic.

- J.J.

At 1:23 PM +0200 10/6/07, Ycrux wrote:

Is there a simple way (or command line tool)
to merge different Solr indexes (located on different machines)
into one ?

Perhaps someone else can think of a way to do this that capitalizes on 
Solr-specific features, but it can certainly be done using standard Lucene 
calls if the remote index is accessible as a filesystem mount, e.g. via NFS.

I have a lot of little Lucene helper scripts written in BeanShell (available 
and documented at http://www.beanshell.org/) to save the bother of figuring out 
where to put the classes and get them in my classpath; while BSH is 
interpreted, since all of the work is done in Lucene code there's no 
performance issue.  All you need is a bsh.jar and a lucene.jar in the classpath.

----- merge.bsh -----

#!/usr/bin/java bsh.Interpreter

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;


if( bsh.args.length < 2 ) {
       print( "Usage: Merge [-create] <dest-index> <src-index> [ <src-index2> ... 
]" );
       return(-1);
}

int argnum = 0;

boolean create = false;

if( "-create".equals(bsh.args[argnum]) ) {
   create = true; ++argnum;
}

String dstName = bsh.args[argnum++];

java.util.ArrayList readerList = new java.util.ArrayList();

while( argnum < bsh.args.length ) {
   String srcName=bsh.args[argnum++];
   IndexReader reader = IndexReader.open(srcName);
   print( srcName + ":\t" + reader.numDocs() + " documents");
   readerList.add( reader );
}

IndexReader[] readerArray = new IndexReader[ readerList.size() ];
for( int i = 0; i < readerArray.length; i++ )
       readerArray[i] = (IndexReader)readerList.get(i);
readerList = null;

IndexWriter writer = null;

try {
       print( (create ? "Creating" : "Opening") + dstName + " for merge");
       writer = new IndexWriter(dstName, new StandardAnalyzer(), create);
       if( readerArray.length > 0 ) {
               t0 = System.currentTimeMillis();
               c0 = writer.docCount();
               print( dstName + ":\t" + c0 + " documents");
               writer.addIndexes( readerArray );
               t1 = System.currentTimeMillis();
               c1 = writer.docCount();
               print( "Index " + dstName + " went from " + c0 + " to " + c1 + " (" + (c1 
- c0) + ") documents in "
                       + (t1 - t0)/1000.0 + "sec" + " (e.g. " + ((t1 - t0) / ((c1 - 
c0)*1.0)) + " millisec each" );
       }
}
catch( Exception ex ) {
       ex.printStackTrace();
}
finally {
       if( writer != null )
               writer.close();
       for( int i = 0; i < readerArray.length; i++ )
           if( readerArray[i] != null )
                       readerArray[i].close();
}

----- /merge.bsh -----

Also note that it is often much faster to merge n indexes into a new empty 
index (e.g. with the -create option) than to merge n-1 indexes into an existing 
index, due to to the pre- and post-optimizations that addIndexes does.

- J.J.

Re: Merging multiple Solr Indexes

Reply via email to