So, because the OS is doing the caching in RAM. It means I could have 6 jetty servers per machine all pointing to the same data. Once the index is built, I can load up some more servers on different ports and it will boost performance.
That does sound promising - thanks for the tip. What made you pick 6? On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim <[EMAIL PROTECTED]> wrote: > Another thing to consider on your sharding is the access rate you want to > guarantee. > > In the project I am working, I need to guarantee at least 200hits/second > with various facets in all queries. > > I am not using sharding, but I have 6 Solr instances per cluster node, and I > have 3 nodes, to a total of 18 solr instances. Each node has only one index, > so I keep the 6 instance pointing to the same the index in a given node. > What made a huge diference in my performance was the removal of the lock. > > I expect that helps you out. > > 2008/8/20 Ian Connor <[EMAIL PROTECTED]> > >> I have based my machines on bare bones servers (I call them ghetto >> servers). I essentially have motherboards in a rack sitting on >> catering trays (heat resistance is key). >> >> http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html >> >> Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM >> slots - allows as much cheap RAM as possible) >> CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see >> if the different RAM approach works better and they are greener) >> Memory: 8GB (4 x 2GB DDR2 - best price per GB) >> HDD: SATA Disk (between 200 to 500GB - I had these from another project) >> >> I have HAProxy between the App servers and Solr so that I get failover >> if one of these goes down (expect failure). >> >> Having only 1M documents but more data per document will mean your >> situation is different. I am having particular performance issues with >> facets and trying to get my head around all the issues involved there. >> >> I see Mike has only 2 shards per box as he was "squeezing" >> performance. I didn't see any significant gain in performance but that >> is not to say there isn't one. Just for me, I had a level of >> performance in mind and stopped when that was met. It took almost a >> month of testing to get to that point so I was ready to move on to >> other problems - I might revisit it later. >> >> Also, my ghetto servers are getting similar reliability to the Dell >> Servers I have - but I have built the system with the expectations >> they will fail often although that has not happened yet. >> >> On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim >> <[EMAIL PROTECTED]> wrote: >> > As long as Solr/Lucene makes smart use from memory (and they from my >> > experiences), it is really easy to calculate how long a huge query/update >> > will take when you know how much the smaller ones will take. Just keep in >> > mind that the resource consumption of memory and disk space is almost >> always >> > proportional. >> > >> > 2008/8/19 Mike Klaas <[EMAIL PROTECTED]> >> > >> >> >> >> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: >> >> >> >>> >> >>> So you experience differs from Mike's. Obviously it's an important >> >>> decision as to whether to buy more machines. Can you (or Mike) weigh >> in on >> >>> what factors led to your different take on local shards vs. shards >> >>> distributed across machines? >> >>> >> >> >> >> I do both; the only reason I have two shards on each machine is to >> squeeze >> >> maximum performance out of an equipment budget. Err on the side of >> multiple >> >> machines. >> >> >> >> At least for building the index, the number of shards really does >> >>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a >> >>>> single machine starts at about 100doc/s but slows down to 10doc/s when >> >>>> the index grows. It seems as though the limit is reached once you run >> >>>> out of RAM and it gets slower and slower in a linear fashion the >> >>>> larger the index you get. >> >>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of >> >>>> data. >> >>>> >> >>> >> >>> Can you say what the specs were for these machines? Given that I have >> more >> >>> like 1TB of data over 1M docs how do you think my machine requirements >> might >> >>> be affected as compared to yours? >> >>> >> >> >> >> You are in a much better position to determine this than we are. See >> how >> >> big an index you can put on a single machine while maintaining >> acceptible >> >> performance using a typical query load. It's relatively safe to >> extrapolate >> >> linearly from that. >> >> >> >> -Mike >> >> >> > >> > >> > >> > -- >> > Alexander Ramos Jardim >> > >> >> >> >> -- >> Regards, >> >> Ian Connor >> 1 Leighton St #605 >> Cambridge, MA 02141 >> Direct Line: +1 (978) 6333372 >> Call Center Phone: +1 (714) 239 3875 (24 hrs) >> Mobile Phone: +1 (312) 218 3209 >> Fax: +1(770) 818 5697 >> Suisse Phone: +41 (0) 22 548 1664 >> Skype: ian.connor >> > > > > -- > Alexander Ramos Jardim > -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 6333372 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor