It very much depends on your data and also what query features you will use.
How many fields, the size of each field, how many unique values per field, how
many fields are stored vs. only indexed, etc. I have a system with 3+ billion
does, and each instance (each index core) has 120million doc
Ok, My bad. I should have put it in a better way.
Is it good idea to have all the 30M docs on a single instance, or should I
consider distributed set-up.
I have synthesized the data and the have configured schema and have made
suitable changes to the config. Have tested out with a smaller data-set
Your question is really unanswerable, there are about a zillion
factors that could influence the answer. I can index 5-7K docs/second
so it's "efficient". Others can index only a fraction of that. It all depends...
Try it and see is about the only way to answer.
Best
Erick
On Thu, Mar 8, 2012 at