Ah sorry, I had misread your original post.  3-6M docs per hour can be
challenging.
Using the CSV loader, I've indexed 4000 docs per second (14M per hour)
on a 2.6GHz Athlon, but they were relatively simple and small docs.

On Fri, Nov 28, 2008 at 9:54 PM, souravm <[EMAIL PROTECTED]> wrote:
> There is a case where I'm expecting at peak season around 36M doc per day, at 
> hourly level peaking to 2-3M per hr. Now I need to do some processing of 
> those docs before I index them. Then based on the performance figure of 
> indexing I saw in http://wiki.apache.org/solr/SolrPerformanceFactors (the 
> embedded vs http post section) - it looks like it would take more than 2 hr 
> index a 3M records using 4 machine. So I thought it would be difficult to 
> achieve my goal only through Solr I need something else to further increasing 
> the parallel processing.
>
> All together the doc size targeted would be around average 3B (the size would 
> be around 300 Gb).

You definitely need distributed search.  Don't try to search this on a
single box.

> The docs would get constantly added and deleted every day basis at an average 
> rate of 8M per day peak
> being 36M. Now considering around 10 boxes, every box need to store around 
> 250M docs.

250M docs per box is probably too high, even for distributed search,
unless your query throughput and latency requirements are very low.

-Yonik

Reply via email to