At 9:51 PM -0700 10/7/07, Chris Hostetter wrote: >: Thanks for the pointer. After two silent days waiting for reply, >: I decided to implement a command line for that. Works like a charm !!! > >well, sometimes people just don't post because they don't know the >answer to something (better then 50 people posting "i don't know").
And then there's the case of people who intend to respond and sometimes even take the time to start on a response, but then get sidetracked by other things, and by the time they get around to hitting "send" the point has already been mooted by other posters... there's so much useful and quick commentary on solr-user one has to be quick! Thus in light of your having given up and written your own merger, and Hoss' mention of the built-in org.apache.lucene.misc.IndexMergeTool (which I wish I'd known about), I didn't send the following, which I'm only sending now in case someone else might find BeanShell a useful tool for rapid prototyping, and with apologies for cluttering the list with something which is technically now off-topic. - J.J. >At 1:23 PM +0200 10/6/07, Ycrux wrote: >>Is there a simple way (or command line tool) >>to merge different Solr indexes (located on different machines) >>into one ? > >Perhaps someone else can think of a way to do this that capitalizes on >Solr-specific features, but it can certainly be done using standard Lucene >calls if the remote index is accessible as a filesystem mount, e.g. via NFS. > >I have a lot of little Lucene helper scripts written in BeanShell (available >and documented at http://www.beanshell.org/) to save the bother of figuring >out where to put the classes and get them in my classpath; while BSH is >interpreted, since all of the work is done in Lucene code there's no >performance issue. All you need is a bsh.jar and a lucene.jar in the >classpath. > >----- merge.bsh ----- > >#!/usr/bin/java bsh.Interpreter > >import org.apache.lucene.index.IndexReader; >import org.apache.lucene.index.IndexWriter; > > >if( bsh.args.length < 2 ) { > print( "Usage: Merge [-create] <dest-index> <src-index> [ <src-index2> > ... ]" ); > return(-1); >} > >int argnum = 0; > >boolean create = false; > >if( "-create".equals(bsh.args[argnum]) ) { > create = true; ++argnum; >} > >String dstName = bsh.args[argnum++]; > >java.util.ArrayList readerList = new java.util.ArrayList(); > >while( argnum < bsh.args.length ) { > String srcName=bsh.args[argnum++]; > IndexReader reader = IndexReader.open(srcName); > print( srcName + ":\t" + reader.numDocs() + " documents"); > readerList.add( reader ); >} > >IndexReader[] readerArray = new IndexReader[ readerList.size() ]; >for( int i = 0; i < readerArray.length; i++ ) > readerArray[i] = (IndexReader)readerList.get(i); >readerList = null; > >IndexWriter writer = null; > >try { > print( (create ? "Creating" : "Opening") + dstName + " for merge"); > writer = new IndexWriter(dstName, new StandardAnalyzer(), create); > if( readerArray.length > 0 ) { > t0 = System.currentTimeMillis(); > c0 = writer.docCount(); > print( dstName + ":\t" + c0 + " documents"); > writer.addIndexes( readerArray ); > t1 = System.currentTimeMillis(); > c1 = writer.docCount(); > print( "Index " + dstName + " went from " + c0 + " to " + c1 + > " (" + (c1 - c0) + ") documents in " > + (t1 - t0)/1000.0 + "sec" + " (e.g. " + ((t1 - t0) / > ((c1 - c0)*1.0)) + " millisec each" ); > } >} >catch( Exception ex ) { > ex.printStackTrace(); >} >finally { > if( writer != null ) > writer.close(); > for( int i = 0; i < readerArray.length; i++ ) > if( readerArray[i] != null ) > readerArray[i].close(); >} > >----- /merge.bsh ----- > >Also note that it is often much faster to merge n indexes into a new empty >index (e.g. with the -create option) than to merge n-1 indexes into an >existing index, due to to the pre- and post-optimizations that addIndexes does. > >- J.J.