Hi Janne, I *think* Ocean Realtime Search has been superseded by Lucene NRT search.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Janne Majaranta <janne.majara...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Thu, February 18, 2010 2:12:37 AM > Subject: Re: Realtime search and facets with very frequent commits > > Hi, > > Yes, I did play with mergeFactor. > I didn't play with mergePolicy. > > Wouldn't that affect indexing speed and possibly memory usage ? > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec via > the standard HTTP API ). > > My problem is that I need very warm caches to get fast faceting, and the > autowarming of the caches takes too long compared to the frequency of > commits I'm having. > So a commit every minute means less than a minute time to warm the caches. > > To give you a idea of what kind of queries needs to be autowarmed in my app, > the logevents indexed as documents have timestamps with different > granularity used for faceting. > For example, to get count of logevents for every hour using faceting there's > a timestamp field with the format yyyymmddhh ( for example: 2010021808 > meaning 2010-02-18 8am). > One use case is to get hourly counts over the whole index. A non-cached > query counting the hourly counts over the 40M documents index takes a > while.. > And to my understanding autowarming means something like that this kind of > query would be basically re-executed against a cold cache. Probably not > exactly how it works, but it "feels" like it would. > > Moving the commits to a smaller index while using sharding to have a > transparent view to the index from the client app seems to solve my problem. > > I'm not sure if the (upcoming?) NRT features would keep the caches more > persistent, probably not in a environment where docs get frequent updates / > deletes. > > Also, I'm closely following the Ocean Realtime Search project AND it's SOLR > integration. It sounds like it has the "dream features" to enable realtime > updates to the index. > > -Janne > > > 2010/2/18 Jan Høydahl / Cominvent > > > Hi, > > > > Have you tried playing with mergeFactor or even mergePolicy? > > > > -- > > Jan Høydahl - search architect > > Cominvent AS - www.cominvent.com > > > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote: > > > > > Hey Dipti, > > > > > > Basically query optimizations + setting cache sizes to a very high level. > > > Other than that, the config is about the same as the out-of-the-box > > config > > > that comes with the Solr download. > > > > > > I haven't found a magic switch to get very fast query responses + facet > > > counts with the frequency of commits I'm having using one single SOLR > > > instance. > > > Adding some TOP queries for a certain type of user to static warming > > queries > > > just moved the time of autowarming the caches to the time it took to warm > > > the caches with static queries. > > > I've been staging a setup where there's a small solr instance receiving > > all > > > the updates and a large instance which doesn't receive the live feed of > > > updates. > > > The small index will be merged with the large index periodically (once a > > > week or once a month). > > > The two instances are seen by the client app as one instance using the > > > sharding features of SOLR. > > > The instances are running on the same server inside their own JVM / > > jetty. > > > > > > In this setup the caches are very HOT for the large index and queries are > > > extremely fast, and the small index is small enough to get extremely fast > > > queries without having to warm up the caches too much. > > > > > > Basically I'm able to have a commit frequency of 10 seconds in a 40M docs > > > index while counting TOP5 facets over 14 fields in 200ms. > > > In reality the commit frequency of 10 seconds comes from the fact that > > the > > > updates are going into a 1M - 2M documents index, and the fast facet > > counts > > > from the fact that the 38M documents index has hot caches and doesn't > > > receive any updates. > > > > > > Also, not running updates to the large index means that the SOLR instance > > > reading the large index uses about half the memory it used before when > > > running the updates to the large index. At least it does so on Win2k3. > > > > > > -Janne > > > > > > > > > 2010/2/15 dipti khullar > > > > > >> Hey Janne > > >> > > >> Can you please let me know what other optimizations are you talking > > about > > >> here. Because in our application we are committing in about 5 mins but > > >> still > > >> the response time is very low and at times there are some connection > > time > > >> outs also. > > >> > > >> Just wanted to confirm if you have done some major configuration changes > > >> which have proved beneficial. > > >> > > >> Thanks > > >> Dipti > > >> > > >> > > > >