Re: Distribution and Tomcat

2006-06-26 Thread Bill Au

I added what I considered a first draft into the solrconfig.xml wiki.

Bill

On 6/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:



: You can put a load balancer in front of the pool of slave servers for
that.

Solr does have some features designed to make Load Balancing easy

  * "healthcheck" URLs that your LoadBalancer can query to determine when
it should add/remove a server from rotation

  * a pingQuery which allowing you to control in the solrconfig.xml what
query should be executed when a LoadBalancer (or anyone) hits the
/admin/ping URL for checkign the response time of various slaves if you
want "response time" load balancing.

Neither of which seeem to be documented very well in the Wiki...

Bill, do you think maybe you could add a little bit on each of these to
the SolrConfigXml wiki page?


-Hoss




Re: Faceted Browsing questions

2006-06-26 Thread Erik Hatcher


On Jun 24, 2006, at 4:29 PM, Yonik Seeley wrote:

On 6/24/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:

This weekend :)   I have imported more data than my hacked
implementation can handle without bumping up Jetty's JVM heap size,
so I'm now at the point where it is necessary for me to start using
the LRUCache.  Though I have already refactored to use OpenBitSet
instead of BitSet.


You can also fit more in mem if you can use DocSet (HashDocSet) for
smaller sets.  This will also speed up intersection counts.  This is
done automatically when you get the DocSet from Solr, or if numDocs()
is used.


Thanks for this advice, Yonik.   I've refactored (but not committed  
yet, for those that may be looking to see what I've done) the  
caching.  The cache (currently a single HashMap) is built keyed by  
field name, with nested HashMap's keyed by field value.  The inner  
map used to contain BitSets, then OpenBitSets, but now it contains  
only TermQuery's.  Now I simply use SolrIndexSearcher.getDocSet 
(query) and rely on the existing query caching.  The only thing my  
custom cache puts into RAM now is this HashMap of all faceted fields,  
values, and associated TermQuery's.  At some point that might even  
become an issue, but maybe not.


It may not even be necessary to cache this type of lookup since it is  
simply a TermEnum through specific fields in the index.  Maybe simply  
doing the TermEnum in the request handler instead of iterating  
through a cache would be just as fast or faster.  Any thoughts on that?


Either way, at the moment things are screaming fast and memory is  
pleasantly under control.


My next challenge is to re-implement the catch-all facets that I used  
to do by unioning all documents in an (Open)BitSet and inverting it.   
How can I invert a DocSet (I realize I gat get the bits and do it  
that way, but is there a better way)?


Erik



autowarmCount usefulness

2006-06-26 Thread Erik Hatcher
I'm trying to fully understand the LRUCache and the autowarmCount  
parameter.   Why does it make sense to auto-warm filters and query  
results?   In my case, if a new document is added it may invalidate  
many filters, and it would require knowing the details of the  
documents added/removed to know which caches could be copied.


Can someone shed light on the scenarios where blindly copying over  
any cached filters (or query results) makes sense?


Thanks,
Erik



Re: autowarmCount usefulness

2006-06-26 Thread Chris Hostetter

: I'm trying to fully understand the LRUCache and the autowarmCount
: parameter.   Why does it make sense to auto-warm filters and query
: results?   In my case, if a new document is added it may invalidate
: many filters, and it would require knowing the details of the
: documents added/removed to know which caches could be copied.
:
: Can someone shed light on the scenarios where blindly copying over
: any cached filters (or query results) makes sense?

Autowarming of the filterCache and queryResultCache doesn't just copy the
cached values -- it reexecutes the queries used as the keys for those
caches and generates new DocSet/DocLists using the *new* searcher, before
that searcher is made available to threads serving queries over HTTP.

For named User caches, autowarming doesn't work at all unless you've
specified a regenerator -- which can do whatever it wants using the new
searcher and the information from the old cache.

The documentCache doesn't support autowarming at all (because the key is
doc id, and as you say: those change with every commit).


The reason autowarming is configured using an autowarmCount is so you can
control just how much effort Solr should put into the autowarming of the
new cache ... if you've got a limitless supply of RAM, and an index that
doesn't change very often, you can make your caches so big that no
DocSet/DocList is ever generated dynamically more then once -- but what
happens when your index does finally change? ... if your autowarmCount
is the same as the size of your index, Solr could spend days autowarming
every query ever executed against your index, even if it was only executed
one time 3 weeks ago.  the autowarmCount tells Solr to only warm the N
"best" keys in the cache where "best" is defined by the Cache
implimentation (for an LRUCache, the "best" things are the things most
recently used).


Once upon a time Yonik and I hypothisized that it would be cool to have
autowarmTimelimit and autowarmPercentage (of current size) params and some
other things like that so you could have other ways of tweaking just how
much autowarming is done on your behalf ... but they were never built.



-Hoss



Re: Distribution and Tomcat

2006-06-26 Thread Jeff Rodenburg

That's great information, thanks Bill

On 6/26/06, Bill Au <[EMAIL PROTECTED]> wrote:


I added what I considered a first draft into the solrconfig.xml wiki.

Bill

On 6/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : You can put a load balancer in front of the pool of slave servers for
> that.
>
> Solr does have some features designed to make Load Balancing easy
>
>   * "healthcheck" URLs that your LoadBalancer can query to determine
when
> it should add/remove a server from rotation
>
>   * a pingQuery which allowing you to control in the solrconfig.xml what
> query should be executed when a LoadBalancer (or anyone) hits the
> /admin/ping URL for checkign the response time of various slaves if you
> want "response time" load balancing.
>
> Neither of which seeem to be documented very well in the Wiki...
>
> Bill, do you think maybe you could add a little bit on each of these to
> the SolrConfigXml wiki page?
>
>
> -Hoss
>
>




Re: Faceted Browsing questions

2006-06-26 Thread Chris Hostetter

: It may not even be necessary to cache this type of lookup since it is
: simply a TermEnum through specific fields in the index.  Maybe simply
: doing the TermEnum in the request handler instead of iterating
: through a cache would be just as fast or faster.  Any thoughts on that?

While commuting I've been letting my brain bounce arround various ideas
for a completley generic totally reusable faceting request handler, and
I've been mulling over teh same question ... my current theory is that it
might make sense to cache a bounded Priority queue of the Terms for each
faceting field where the priority is determined by the docFreq, and the
size is configurable.  that way you can start with the values in the
queue and if/when you reach a point where the docFreq of the next item in
the queue is less then the lowest intersection count you've found so far,
and you already have as many items as you want to display, you don't have
to bother checking all of the other values (and you don't have to bother
with the TermEnum unless you completely exhaust the queue)

: My next challenge is to re-implement the catch-all facets that I used
: to do by unioning all documents in an (Open)BitSet and inverting it.
: How can I invert a DocSet (I realize I gat get the bits and do it
: that way, but is there a better way)?

well, the most obvious solution i can think of would be a patch adding an
invert() method to DocSet, HashDocSet and BitDocSet.   :)

there was some discussion about this on the list previously if i recall
correctly.


-Hoss