One problem is the IT logistics of handling the file set. At 200 million records you have at least 20G of data in one Lucene index. It takes hours to optimize this, and 10s of minutes to copy the optimized index around to query servers. Another problem is that indexing speed drops off after the index reaches a certain size. When making multiple indexes, you want to stop indexing before that size. Lance
On Tue, Aug 25, 2009 at 10:44 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : We're doing similar thing with multi-core - when a core reaches > : capacity (in our case 200 million records) we start a new core. We are > : doing this via web service call (Create web service), > > this whole thread perplexes me ... while i can understand not wanting to > let an index grow without bound becuase of hardware limitation, i don't > understand what value you are gaining by creating a new core on the same > box -- you're using the same physical resources to search the same number > of documents, making multiple cores for this actaully seems like it would > take up *more* resources to search the same amount of content, because the > individual cores will be isolated and the term dictionaries can't be > shared (not to mention you have to do a multi-shard query to get results > from all the cores) > > are you doing something special with the old cores vs the new ones? (ie: > create the new cores on new machines, shutdown cores after a certian > amount of time has expired, etc...) > > > : > Hi there, > : > > : > currently we want to add cores dynamically when the active one reaches > : > some capacity, > : > can anyone give me some hints to achieve such this functionality? (Just > : > wondering if you have used shell-scripting or you have code some 100% > : > Java based solution) > : > > : > Thx > : > > : > > : > -- > : > Lici > : > > : > > > > -Hoss > > -- Lance Norskog goks...@gmail.com