Hi Dimitry, >>The parameters you have menioned -- termInfosIndexDivisor and >>termIndexInterval -- are not found in the solr 1.4.1 config|schema. Are you >>using SOLR 3.1?
I'm pretty sure that the termIndexInterval (ratio of tii file to tis file) is in the 1.4.1 example solrconfig.xml file, although I don't have a copy to check at the moment. We are using a 3.1 dev version. As far as the termInfosIndexDivisor, I I'm also pretty sure it works with 1.4.1, but you might have to ask the list to be sure. As you can see from the blog posts those settings really reduced our memory requirements. We haven't been doing faceting so we expect memory use to go up again once we add faceting, but at least we are starting at a 4GB baseline instead of a 20-32GB baseline. >>Did you you do logical sharding or document hash based? On the indexing side we just assign documents to a particular shard on a round robin basis and use a database to keep track of which document is in which shard so if we need to update it we update the right shard (See the "Forty days" article on the blog for a more detailed description and some diagrams) . We hope that this distributes the documents evenly enough to avoid problems with Solr's lack of global idf. >>Do you have load balancer between the front SOLR (or front entity) and shards, As far as load balancing which shard is the head shard/front shard, again, our app layer just randomly picks one of the shards to be the head shard. We originally were going to do tests to determine if it was better to have one dedicated machine configured to be the head shard, but never got around to that. We have a very low query request rate, so haven't had to seriously look at load balancing >>do you do merging? I'm not sure what you mean by "do you do merging" . We are just using the default Solr distributed search. In theory our documents should be randomly distributed among the shards so the lack of global idf should not hurt the merging process. Andrzej Bialecki gave a recent presentation on Solr distributed search that talks about less than optimal results merging and some ideas for dealing with it: http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/AndrzejBialecki-Buzzwords-2011_0.pdf >>Each shard currently is allocated max 12GB memory. I'm curious about how much memory you leave to the OS for disk caching. Can you give any details about the number of shards per machine and the total memory on the machine. Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search ________________________________________ From: Dmitry Kan [dmitry....@gmail.com] Sent: Tuesday, June 14, 2011 2:15 PM To: solr-user@lucene.apache.org Subject: Re: huge shards (300GB each) and load balancing Hi Tom, Thanks a lot for sharing this. We have about half a terabyte total index size, and we have split our index over 10 shards (horizontal scaling, no replication). Each shard currently is allocated max 12GB memory. We use facet search a lot and non-facet search with parameter values generated by facet search (hence more focused search that hits small portion of solr documents). The parameters you have menioned -- termInfosIndexDivisor and termIndexInterval -- are not found in the solr 1.4.1 config|schema. Are you using SOLR 3.1? Did you you do logical sharding or document hash based? Do you have load balancer between the front SOLR (or front entity) and shards, do you do merging?