Ah sorry, I had misread your original post. 3-6M docs per hour can be challenging. Using the CSV loader, I've indexed 4000 docs per second (14M per hour) on a 2.6GHz Athlon, but they were relatively simple and small docs.
On Fri, Nov 28, 2008 at 9:54 PM, souravm <[EMAIL PROTECTED]> wrote: > There is a case where I'm expecting at peak season around 36M doc per day, at > hourly level peaking to 2-3M per hr. Now I need to do some processing of > those docs before I index them. Then based on the performance figure of > indexing I saw in http://wiki.apache.org/solr/SolrPerformanceFactors (the > embedded vs http post section) - it looks like it would take more than 2 hr > index a 3M records using 4 machine. So I thought it would be difficult to > achieve my goal only through Solr I need something else to further increasing > the parallel processing. > > All together the doc size targeted would be around average 3B (the size would > be around 300 Gb). You definitely need distributed search. Don't try to search this on a single box. > The docs would get constantly added and deleted every day basis at an average > rate of 8M per day peak > being 36M. Now considering around 10 boxes, every box need to store around > 250M docs. 250M docs per box is probably too high, even for distributed search, unless your query throughput and latency requirements are very low. -Yonik