Re: Distribution and Tomcat
I added what I considered a first draft into the solrconfig.xml wiki. Bill On 6/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : You can put a load balancer in front of the pool of slave servers for that. Solr does have some features designed to make Load Balancing easy * "healthcheck" URLs that your LoadBalancer can query to determine when it should add/remove a server from rotation * a pingQuery which allowing you to control in the solrconfig.xml what query should be executed when a LoadBalancer (or anyone) hits the /admin/ping URL for checkign the response time of various slaves if you want "response time" load balancing. Neither of which seeem to be documented very well in the Wiki... Bill, do you think maybe you could add a little bit on each of these to the SolrConfigXml wiki page? -Hoss
Re: Faceted Browsing questions
On Jun 24, 2006, at 4:29 PM, Yonik Seeley wrote: On 6/24/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: This weekend :) I have imported more data than my hacked implementation can handle without bumping up Jetty's JVM heap size, so I'm now at the point where it is necessary for me to start using the LRUCache. Though I have already refactored to use OpenBitSet instead of BitSet. You can also fit more in mem if you can use DocSet (HashDocSet) for smaller sets. This will also speed up intersection counts. This is done automatically when you get the DocSet from Solr, or if numDocs() is used. Thanks for this advice, Yonik. I've refactored (but not committed yet, for those that may be looking to see what I've done) the caching. The cache (currently a single HashMap) is built keyed by field name, with nested HashMap's keyed by field value. The inner map used to contain BitSets, then OpenBitSets, but now it contains only TermQuery's. Now I simply use SolrIndexSearcher.getDocSet (query) and rely on the existing query caching. The only thing my custom cache puts into RAM now is this HashMap of all faceted fields, values, and associated TermQuery's. At some point that might even become an issue, but maybe not. It may not even be necessary to cache this type of lookup since it is simply a TermEnum through specific fields in the index. Maybe simply doing the TermEnum in the request handler instead of iterating through a cache would be just as fast or faster. Any thoughts on that? Either way, at the moment things are screaming fast and memory is pleasantly under control. My next challenge is to re-implement the catch-all facets that I used to do by unioning all documents in an (Open)BitSet and inverting it. How can I invert a DocSet (I realize I gat get the bits and do it that way, but is there a better way)? Erik
autowarmCount usefulness
I'm trying to fully understand the LRUCache and the autowarmCount parameter. Why does it make sense to auto-warm filters and query results? In my case, if a new document is added it may invalidate many filters, and it would require knowing the details of the documents added/removed to know which caches could be copied. Can someone shed light on the scenarios where blindly copying over any cached filters (or query results) makes sense? Thanks, Erik
Re: autowarmCount usefulness
: I'm trying to fully understand the LRUCache and the autowarmCount : parameter. Why does it make sense to auto-warm filters and query : results? In my case, if a new document is added it may invalidate : many filters, and it would require knowing the details of the : documents added/removed to know which caches could be copied. : : Can someone shed light on the scenarios where blindly copying over : any cached filters (or query results) makes sense? Autowarming of the filterCache and queryResultCache doesn't just copy the cached values -- it reexecutes the queries used as the keys for those caches and generates new DocSet/DocLists using the *new* searcher, before that searcher is made available to threads serving queries over HTTP. For named User caches, autowarming doesn't work at all unless you've specified a regenerator -- which can do whatever it wants using the new searcher and the information from the old cache. The documentCache doesn't support autowarming at all (because the key is doc id, and as you say: those change with every commit). The reason autowarming is configured using an autowarmCount is so you can control just how much effort Solr should put into the autowarming of the new cache ... if you've got a limitless supply of RAM, and an index that doesn't change very often, you can make your caches so big that no DocSet/DocList is ever generated dynamically more then once -- but what happens when your index does finally change? ... if your autowarmCount is the same as the size of your index, Solr could spend days autowarming every query ever executed against your index, even if it was only executed one time 3 weeks ago. the autowarmCount tells Solr to only warm the N "best" keys in the cache where "best" is defined by the Cache implimentation (for an LRUCache, the "best" things are the things most recently used). Once upon a time Yonik and I hypothisized that it would be cool to have autowarmTimelimit and autowarmPercentage (of current size) params and some other things like that so you could have other ways of tweaking just how much autowarming is done on your behalf ... but they were never built. -Hoss
Re: Distribution and Tomcat
That's great information, thanks Bill On 6/26/06, Bill Au <[EMAIL PROTECTED]> wrote: I added what I considered a first draft into the solrconfig.xml wiki. Bill On 6/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > : You can put a load balancer in front of the pool of slave servers for > that. > > Solr does have some features designed to make Load Balancing easy > > * "healthcheck" URLs that your LoadBalancer can query to determine when > it should add/remove a server from rotation > > * a pingQuery which allowing you to control in the solrconfig.xml what > query should be executed when a LoadBalancer (or anyone) hits the > /admin/ping URL for checkign the response time of various slaves if you > want "response time" load balancing. > > Neither of which seeem to be documented very well in the Wiki... > > Bill, do you think maybe you could add a little bit on each of these to > the SolrConfigXml wiki page? > > > -Hoss > >
Re: Faceted Browsing questions
: It may not even be necessary to cache this type of lookup since it is : simply a TermEnum through specific fields in the index. Maybe simply : doing the TermEnum in the request handler instead of iterating : through a cache would be just as fast or faster. Any thoughts on that? While commuting I've been letting my brain bounce arround various ideas for a completley generic totally reusable faceting request handler, and I've been mulling over teh same question ... my current theory is that it might make sense to cache a bounded Priority queue of the Terms for each faceting field where the priority is determined by the docFreq, and the size is configurable. that way you can start with the values in the queue and if/when you reach a point where the docFreq of the next item in the queue is less then the lowest intersection count you've found so far, and you already have as many items as you want to display, you don't have to bother checking all of the other values (and you don't have to bother with the TermEnum unless you completely exhaust the queue) : My next challenge is to re-implement the catch-all facets that I used : to do by unioning all documents in an (Open)BitSet and inverting it. : How can I invert a DocSet (I realize I gat get the bits and do it : that way, but is there a better way)? well, the most obvious solution i can think of would be a patch adding an invert() method to DocSet, HashDocSet and BitDocSet. :) there was some discussion about this on the list previously if i recall correctly. -Hoss