On Sun, 2011-09-25 at 22:00 +0200, Ikhsvaku S wrote: > Documents: We have close to ~12 million XML docs, of varying sizes average > size 20 KB. These documents have 150 fields, which should be searchable & > indexed. [...] Approximately ~6000 such documents are updated & 400-800 new > ones > are added each day > > Queries: [...] Also each one would want to grab as many result rows as > possible > (we are limiting this to 2000). The output shall contain only 1-5 fields.
Except for the result rows (which I guess is equal to returned documents in Solr-world), nothing you say raises any alarms. It actually sounds very much like our local index (~10M documents, ~100 fields, 10.000+ updates/day) at the State and University Library, Denmark. > Available hardware: > Some of existing hardware we could find consists of existing ~300GB SAN each > on 4 Boxes with ~96Gig each. We do couple of older HP DL380s (mainly want to > use for offline indexing). All of this is on 10G Ethernet. Yikes! We only use two mirrored machines for fallback, not performance. They have 16GB each and handle index updates as well as searches. The indexes (~60GB) reside on local SSDs. > Questions: > Our priority is to provide results fast, [...] What is fast in milliseconds and how many queries/second do you anticipate? From what you're telling, your hardware looks like overkill. However, as Eric says, your mileage may wary: Try stuffing all your data into your mock-up and see what happens - it shouldn't take long and you might discover that your test machine is perfectly capable of handling it all alone.