Hi Nico, I don't think there is a tool to split an existing Lucene index, though I imagine one could write such a tool using http://lucene.apache.org/java/2_3_1/fileformats.html as a guide.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Nico Heid <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, April 29, 2008 4:10:09 AM > Subject: Index splitting > > Hi, > Let me first roughly describe the scenario :-) > > We're trying to index online stored data for some thousand users. > The schema.xml has a custom identifier for the user, so FQ can be applied > and further filtering is only done for the user (more important, the user > doesn't get to see results from data not belonging to him) > > Unfortunatelly, the Index might become quite big ( we're indexing more that > 50 TB Data, all kind of files, full text (indexed only, not stored) where > possible, elsewhere fileinfos (size, date) and meta if available) > > So Question the is: > > We're thinking of starting out with multiple Solr instances (either in their > own containers or MultiCore, guess that's not the important point), on 1 to > n machines. Lets just pretend: we do modulo 5 on the user number and assign > it to one of the two machines. The index gets distributed on QuerySlaves ( > 1-m dependend on the need). > > So now the Question: > Is there a way to split a too big index into smaller ones? Do I have to > create more instances at the beginning, so that I will not run out of power > and space? (which will ad quite a bit of redundance of data) > Lets say I miscalculated and used only 2 indices, but now I see I need at > least 4. > > Any idea will be very welcome, > > Thanks, > Nico > > >