I have based my machines on bare bones servers (I call them ghetto servers). I essentially have motherboards in a rack sitting on catering trays (heat resistance is key).
http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM slots - allows as much cheap RAM as possible) CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see if the different RAM approach works better and they are greener) Memory: 8GB (4 x 2GB DDR2 - best price per GB) HDD: SATA Disk (between 200 to 500GB - I had these from another project) I have HAProxy between the App servers and Solr so that I get failover if one of these goes down (expect failure). Having only 1M documents but more data per document will mean your situation is different. I am having particular performance issues with facets and trying to get my head around all the issues involved there. I see Mike has only 2 shards per box as he was "squeezing" performance. I didn't see any significant gain in performance but that is not to say there isn't one. Just for me, I had a level of performance in mind and stopped when that was met. It took almost a month of testing to get to that point so I was ready to move on to other problems - I might revisit it later. Also, my ghetto servers are getting similar reliability to the Dell Servers I have - but I have built the system with the expectations they will fail often although that has not happened yet. On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim <[EMAIL PROTECTED]> wrote: > As long as Solr/Lucene makes smart use from memory (and they from my > experiences), it is really easy to calculate how long a huge query/update > will take when you know how much the smaller ones will take. Just keep in > mind that the resource consumption of memory and disk space is almost always > proportional. > > 2008/8/19 Mike Klaas <[EMAIL PROTECTED]> > >> >> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: >> >>> >>> So you experience differs from Mike's. Obviously it's an important >>> decision as to whether to buy more machines. Can you (or Mike) weigh in on >>> what factors led to your different take on local shards vs. shards >>> distributed across machines? >>> >> >> I do both; the only reason I have two shards on each machine is to squeeze >> maximum performance out of an equipment budget. Err on the side of multiple >> machines. >> >> At least for building the index, the number of shards really does >>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a >>>> single machine starts at about 100doc/s but slows down to 10doc/s when >>>> the index grows. It seems as though the limit is reached once you run >>>> out of RAM and it gets slower and slower in a linear fashion the >>>> larger the index you get. >>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of >>>> data. >>>> >>> >>> Can you say what the specs were for these machines? Given that I have more >>> like 1TB of data over 1M docs how do you think my machine requirements might >>> be affected as compared to yours? >>> >> >> You are in a much better position to determine this than we are. See how >> big an index you can put on a single machine while maintaining acceptible >> performance using a typical query load. It's relatively safe to extrapolate >> linearly from that. >> >> -Mike >> > > > > -- > Alexander Ramos Jardim > -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 6333372 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor