Sorry I didn't make myself clear. I have 20 machines in the configuration, each shard/replica is on it's own machine.
On 14 February 2014 19:44, Shawn Heisey <s...@elyograg.org> wrote: > On 2/14/2014 5:28 AM, Annette Newton wrote: > > Solr Version: 4.3.1 > > Number Shards: 10 > > Replicas: 1 > > Heap size: 15GB > > Machine RAM: 30GB > > Zookeeper timeout: 45 seconds > > > > We are continuing the fight to keep our solr setup functioning. As a > > result of this we have made significant changes to our schema to reduce > the > > amount of data we write. I setup a new cluster to reindex our data, > > initially I ran the import with no replicas, and achieved quite > impressive > > results. Our peak was 60,000 new documents per minute, no shard loses, > no > > outages due to garbage collection (which is an issue we see in > production), > > at the end of the load the index stood at 97,000,000 documents and 20GB > per > > shard. During the highest insertion rate I would say that querying > > suffered, but that is not of concern right now. > > Solr 4.3.1 has a number of problems when it comes to large clouds. > Upgrading to 4.6.1 would be strongly advisable, but that's only > something to try after looking into the rest of what I have to say. > > If I read what you've written correctly, you are running all this on one > machine. To put it bluntly, this isn't going to work well unless you > put a LOT more memory into that machine. > > For good performance, Solr relies on the OS disk cache, because reading > from the disk is VERY expensive in terms of time. The OS will > automatically use RAM that's not being used for other purposes for the > disk cache, so that it can avoid reading off the disk as much as possible. > > http://wiki.apache.org/solr/SolrPerformanceProblems > > Below is a summary of what that Wiki page says, with your numbers as I > understand them. If I am misunderstanding your numbers, then this > advice may need adjustment. Note that when I see "one replica" I take > that to mean replicationFactor=1, so there is only one copy of the > index. If you actually mean that you have *two* copies, then you have > twice as much data as I've indicated below, and your requirements will > be even larger: > > With ten shards that are each 20GB in size, your total index size is > 200GB. With 15 GB of heap, your ideal memory size for that server would > be 215GB -- the 15GB heap plus enough extra to fit the entire 200GB > index into RAM. > > In reality you probably don't need that much, but it's likely that you > would need at least half the index to fit into RAM at any one moment, > which adds up to 115GB. If you're prepared to deal with > moderate-to-severe performance problems, you **MIGHT** be able to get > away with only 25% of the index fitting into RAM, which still requires > 65GB of RAM, but with SolrCloud, such performance problems usually mean > that the cloud won't be stable, so it's not advisable to even try it. > > One of the bits of advice on the wiki page is to split your index into > shards and put it on more machines, which drops the memory requirements > for each machine. You're already using a multi-shard SolrCloud, so you > probably just need more hardware. If you had one 20GB shard on a > machine with 30GB of RAM, you could probably use a heap size of 4-8GB > per machine and have plenty of RAM left over to cache the index very > well. You could most likely add another 50% to the index size and still > be OK. > > Thanks, > Shawn > > -- Annette Newton Database Administrator ServiceTick Ltd T:+44(0)1603 618326 Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ www.servicetick.com *www.sessioncam.com <http://www.sessioncam.com>* -- *This message is confidential and is intended to be read solely by the addressee. The contents should not be disclosed to any other person or copies taken unless authorised to do so. If you are not the intended recipient, please notify the sender and permanently delete this message. As Internet communications are not secure ServiceTick accepts neither legal responsibility for the contents of this message nor responsibility for any change made to this message after it was forwarded by the original author.*