It will depend on how much total volume you have. If you are discussing millions and millions of records, I'd say use multicore and shards.
On Wed, Jul 8, 2009 at 5:25 AM, Tim Sell <trs...@gmail.com> wrote: > Hi, > I am wondering if it is common to have just one very large index, or > multiple smaller indexes specialized for different content types. > > We currently have multiple smaller indexes, although one of them is > much larger then the others. We are considering merging them, to allow > the convenience of searching across multiple types at once and get > them back in one list. The largest of the current indexes has a couple > of types that belong together, it has just one text field, and it is > usually quite short and is similar to product names (words like "The" > matter). Another index I would merge with this one, has multiple text > fields (also quite short). > > We of course would still like to be able to get specific types. Is > doing filtering on just one type a big performance hit compared to > just querying it from it's own index? Bare in mind all these indexes > run on the same machine. (we replicate them all to three machines and > do load balancing). > > There are a number of considerations. From an application standpoint > when querying across all types we may split the results out into the > separate types anyway once we have the list back. If we always do > this, is it silly to have them in one index, rather then query > multiple indexes at once? Is multiple http requests less significant > then the time to post split the results? > > In some ways it is easier to maintain a single index, although it has > felt easier to optimize the results for the type of content if they > are in separate indexes. My main concern of putting it all in one > index is that we'll make it harder to work with. We will definitely > want to do filtering on types sometimes, and if we go with a mashed up > index I'd prefer not to maintain separate specialized indexes as well. > > Any thoughts? > > ~Tim. >