2008/8/20 Ian Connor <[EMAIL PROTECTED]> > So, because the OS is doing the caching in RAM. It means I could have > 6 jetty servers per machine all pointing to the same data. Once the > index is built, I can load up some more servers on different ports and > it will boost performance. > > That does sound promising - thanks for the tip. What made you pick 6? >
Each weblogic instance sits on top of a 2GB heap size JVM. Each cluster node has 16GB RAM. > > On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim > <[EMAIL PROTECTED]> wrote: > > Another thing to consider on your sharding is the access rate you want to > > guarantee. > > > > In the project I am working, I need to guarantee at least 200hits/second > > with various facets in all queries. > > > > I am not using sharding, but I have 6 Solr instances per cluster node, > and I > > have 3 nodes, to a total of 18 solr instances. Each node has only one > index, > > so I keep the 6 instance pointing to the same the index in a given node. > > What made a huge diference in my performance was the removal of the lock. > > > > I expect that helps you out. > > > > 2008/8/20 Ian Connor <[EMAIL PROTECTED]> > > > >> I have based my machines on bare bones servers (I call them ghetto > >> servers). I essentially have motherboards in a rack sitting on > >> catering trays (heat resistance is key). > >> > >> http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html > >> > >> Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM > >> slots - allows as much cheap RAM as possible) > >> CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see > >> if the different RAM approach works better and they are greener) > >> Memory: 8GB (4 x 2GB DDR2 - best price per GB) > >> HDD: SATA Disk (between 200 to 500GB - I had these from another project) > >> > >> I have HAProxy between the App servers and Solr so that I get failover > >> if one of these goes down (expect failure). > >> > >> Having only 1M documents but more data per document will mean your > >> situation is different. I am having particular performance issues with > >> facets and trying to get my head around all the issues involved there. > >> > >> I see Mike has only 2 shards per box as he was "squeezing" > >> performance. I didn't see any significant gain in performance but that > >> is not to say there isn't one. Just for me, I had a level of > >> performance in mind and stopped when that was met. It took almost a > >> month of testing to get to that point so I was ready to move on to > >> other problems - I might revisit it later. > >> > >> Also, my ghetto servers are getting similar reliability to the Dell > >> Servers I have - but I have built the system with the expectations > >> they will fail often although that has not happened yet. > >> > >> On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim > >> <[EMAIL PROTECTED]> wrote: > >> > As long as Solr/Lucene makes smart use from memory (and they from my > >> > experiences), it is really easy to calculate how long a huge > query/update > >> > will take when you know how much the smaller ones will take. Just keep > in > >> > mind that the resource consumption of memory and disk space is almost > >> always > >> > proportional. > >> > > >> > 2008/8/19 Mike Klaas <[EMAIL PROTECTED]> > >> > > >> >> > >> >> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: > >> >> > >> >>> > >> >>> So you experience differs from Mike's. Obviously it's an important > >> >>> decision as to whether to buy more machines. Can you (or Mike) > weigh > >> in on > >> >>> what factors led to your different take on local shards vs. shards > >> >>> distributed across machines? > >> >>> > >> >> > >> >> I do both; the only reason I have two shards on each machine is to > >> squeeze > >> >> maximum performance out of an equipment budget. Err on the side of > >> multiple > >> >> machines. > >> >> > >> >> At least for building the index, the number of shards really does > >> >>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a > >> >>>> single machine starts at about 100doc/s but slows down to 10doc/s > when > >> >>>> the index grows. It seems as though the limit is reached once you > run > >> >>>> out of RAM and it gets slower and slower in a linear fashion the > >> >>>> larger the index you get. > >> >>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB > of > >> >>>> data. > >> >>>> > >> >>> > >> >>> Can you say what the specs were for these machines? Given that I > have > >> more > >> >>> like 1TB of data over 1M docs how do you think my machine > requirements > >> might > >> >>> be affected as compared to yours? > >> >>> > >> >> > >> >> You are in a much better position to determine this than we are. See > >> how > >> >> big an index you can put on a single machine while maintaining > >> acceptible > >> >> performance using a typical query load. It's relatively safe to > >> extrapolate > >> >> linearly from that. > >> >> > >> >> -Mike > >> >> > >> > > >> > > >> > > >> > -- > >> > Alexander Ramos Jardim > >> > > >> > >> > >> > >> -- > >> Regards, > >> > >> Ian Connor > >> 1 Leighton St #605 > >> Cambridge, MA 02141 > >> Direct Line: +1 (978) 6333372 > >> Call Center Phone: +1 (714) 239 3875 (24 hrs) > >> Mobile Phone: +1 (312) 218 3209 > >> Fax: +1(770) 818 5697 > >> Suisse Phone: +41 (0) 22 548 1664 > >> Skype: ian.connor > >> > > > > > > > > -- > > Alexander Ramos Jardim > > > > > > -- > Regards, > > Ian Connor > 1 Leighton St #605 > Cambridge, MA 02141 > Direct Line: +1 (978) 6333372 > Call Center Phone: +1 (714) 239 3875 (24 hrs) > Mobile Phone: +1 (312) 218 3209 > Fax: +1(770) 818 5697 > Suisse Phone: +41 (0) 22 548 1664 > Skype: ian.connor > -- Alexander Ramos Jardim