One problem is the IT logistics of handling the file set. At 200 million
records you have at least 20G of data in one Lucene index. It takes hours to
optimize this, and 10s of minutes to copy the optimized index around to
query servers.
Another problem is that indexing speed drops off after the index reaches a
certain size. When making multiple indexes, you want to stop indexing before
that size.
Lance

On Tue, Aug 25, 2009 at 10:44 AM, Chris Hostetter
<hossman_luc...@fucit.org>wrote:

>
> :   We're doing similar thing with multi-core - when a core reaches
> : capacity (in our case 200 million records) we start a new core. We are
> : doing this via web service call (Create web service),
>
> this whole thread perplexes me ... while i can understand not wanting to
> let an index grow without bound becuase of hardware limitation, i don't
> understand what value you are gaining by creating a new core on the same
> box -- you're using the same physical resources to search the same number
> of documents, making multiple cores for this actaully seems like it would
> take up *more* resources to search the same amount of content, because the
> individual cores will be isolated and the term dictionaries can't be
> shared (not to mention you have to do a multi-shard query to get results
> from all the cores)
>
> are you doing something special with the old cores vs the new ones? (ie:
> create the new cores on new machines, shutdown cores after a certian
> amount of time has expired, etc...)
>
>
> : > Hi there,
> : >
> : > currently we want to add cores dynamically when the active one reaches
> : > some capacity,
> : > can anyone give me some hints to achieve such this functionality? (Just
> : > wondering if you have used shell-scripting or you have code some 100%
> : > Java based solution)
> : >
> : > Thx
> : >
> : >
> : > --
> : > Lici
> : >
> :
>
>
>
> -Hoss
>
>


-- 
Lance Norskog
goks...@gmail.com

Reply via email to