Re: Seek your wisdom for implementing 12 million docs..

Toke Eskildsen Mon, 26 Sep 2011 04:01:18 -0700

On Sun, 2011-09-25 at 22:00 +0200, Ikhsvaku S wrote:
> Documents: We have close to ~12 million XML docs, of varying sizes average
> size 20 KB. These documents have 150 fields, which should be searchable &
> indexed. [...] Approximately ~6000 such documents are updated & 400-800 new 
> ones
> are added each day
>
> Queries: [...] Also each one would want to grab as many result rows as 
> possible
> (we are limiting this to 2000). The output shall contain only 1-5 fields.


Except for the result rows (which I guess is equal to returned documents
in Solr-world), nothing you say raises any alarms. It actually sounds
very much like our local index (~10M documents, ~100 fields, 10.000+
updates/day) at the State and University Library, Denmark.

> Available hardware:
> Some of existing hardware we could find consists of existing ~300GB SAN each
> on 4 Boxes with ~96Gig each. We do couple of older HP DL380s (mainly want to
> use for offline indexing). All of this is on 10G Ethernet.

Yikes! We only use two mirrored machines for fallback, not performance.
They have 16GB each and handle index updates as well as searches. The
indexes (~60GB) reside on local SSDs.

> Questions:
> Our priority is to provide results fast, [...]

What is fast in milliseconds and how many queries/second do you
anticipate? From what you're telling, your hardware looks like overkill.
However, as Eric says, your mileage may wary: Try stuffing all your data
into your mock-up and see what happens - it shouldn't take long and you
might discover that your test machine is perfectly capable of handling
it all alone.

Re: Seek your wisdom for implementing 12 million docs..

Reply via email to