Seems good. Thanks
cheers
Y.
J.J. Larrea a écrit :
At 9:51 PM -0700 10/7/07, Chris Hostetter wrote:
: Thanks for the pointer. After two silent days waiting for reply,
: I decided to implement a command line for that. Works like a charm !!!
well, sometimes people just don't post because they don't know the
answer to something (better then 50 people posting "i don't know").
And then there's the case of people who intend to respond and sometimes even take the
time to start on a response, but then get sidetracked by other things, and by the time
they get around to hitting "send" the point has already been mooted by other
posters... there's so much useful and quick commentary on solr-user one has to be quick!
Thus in light of your having given up and written your own merger, and Hoss'
mention of the built-in org.apache.lucene.misc.IndexMergeTool (which I wish I'd
known about), I didn't send the following, which I'm only sending now in case
someone else might find BeanShell a useful tool for rapid prototyping, and with
apologies for cluttering the list with something which is technically now
off-topic.
- J.J.
At 1:23 PM +0200 10/6/07, Ycrux wrote:
Is there a simple way (or command line tool)
to merge different Solr indexes (located on different machines)
into one ?
Perhaps someone else can think of a way to do this that capitalizes on
Solr-specific features, but it can certainly be done using standard Lucene
calls if the remote index is accessible as a filesystem mount, e.g. via NFS.
I have a lot of little Lucene helper scripts written in BeanShell (available
and documented at http://www.beanshell.org/) to save the bother of figuring out
where to put the classes and get them in my classpath; while BSH is
interpreted, since all of the work is done in Lucene code there's no
performance issue. All you need is a bsh.jar and a lucene.jar in the classpath.
----- merge.bsh -----
#!/usr/bin/java bsh.Interpreter
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
if( bsh.args.length < 2 ) {
print( "Usage: Merge [-create] <dest-index> <src-index> [ <src-index2> ...
]" );
return(-1);
}
int argnum = 0;
boolean create = false;
if( "-create".equals(bsh.args[argnum]) ) {
create = true; ++argnum;
}
String dstName = bsh.args[argnum++];
java.util.ArrayList readerList = new java.util.ArrayList();
while( argnum < bsh.args.length ) {
String srcName=bsh.args[argnum++];
IndexReader reader = IndexReader.open(srcName);
print( srcName + ":\t" + reader.numDocs() + " documents");
readerList.add( reader );
}
IndexReader[] readerArray = new IndexReader[ readerList.size() ];
for( int i = 0; i < readerArray.length; i++ )
readerArray[i] = (IndexReader)readerList.get(i);
readerList = null;
IndexWriter writer = null;
try {
print( (create ? "Creating" : "Opening") + dstName + " for merge");
writer = new IndexWriter(dstName, new StandardAnalyzer(), create);
if( readerArray.length > 0 ) {
t0 = System.currentTimeMillis();
c0 = writer.docCount();
print( dstName + ":\t" + c0 + " documents");
writer.addIndexes( readerArray );
t1 = System.currentTimeMillis();
c1 = writer.docCount();
print( "Index " + dstName + " went from " + c0 + " to " + c1 + " (" + (c1
- c0) + ") documents in "
+ (t1 - t0)/1000.0 + "sec" + " (e.g. " + ((t1 - t0) / ((c1 -
c0)*1.0)) + " millisec each" );
}
}
catch( Exception ex ) {
ex.printStackTrace();
}
finally {
if( writer != null )
writer.close();
for( int i = 0; i < readerArray.length; i++ )
if( readerArray[i] != null )
readerArray[i].close();
}
----- /merge.bsh -----
Also note that it is often much faster to merge n indexes into a new empty
index (e.g. with the -create option) than to merge n-1 indexes into an existing
index, due to to the pre- and post-optimizations that addIndexes does.
- J.J.