Ok, thanks. -Janne
2010/2/18 Jason Rutherglen <jason.rutherg...@gmail.com> > Janne, > > I don't think there's any activity happening there. > > SOLR-1606 is the tracking issue for moving to per segment facets and > docsets. I haven't had an immediate commercial need to implement > those. > > Jason > > On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta > <janne.majara...@gmail.com> wrote: > > Hi Otis, > > > > Ok, now I'm confused ;) > > There seems to be a bit activity though when looking at the "last > updated" > > timestamps in the google code project wiki: > > http://code.google.com/p/oceansearch/w/list > > > > The Tag Index feature sounds very interesting. > > > > -Janne > > > > > > 2010/2/18 Otis Gospodnetic <otis_gospodne...@yahoo.com> > > > >> Hi Janne, > >> > >> I *think* Ocean Realtime Search has been superseded by Lucene NRT > search. > >> > >> Otis > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >> Hadoop ecosystem search :: http://search-hadoop.com/ > >> > >> > >> > >> ----- Original Message ---- > >> > From: Janne Majaranta <janne.majara...@gmail.com> > >> > To: solr-user@lucene.apache.org > >> > Sent: Thu, February 18, 2010 2:12:37 AM > >> > Subject: Re: Realtime search and facets with very frequent commits > >> > > >> > Hi, > >> > > >> > Yes, I did play with mergeFactor. > >> > I didn't play with mergePolicy. > >> > > >> > Wouldn't that affect indexing speed and possibly memory usage ? > >> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec > >> via > >> > the standard HTTP API ). > >> > > >> > My problem is that I need very warm caches to get fast faceting, and > the > >> > autowarming of the caches takes too long compared to the frequency of > >> > commits I'm having. > >> > So a commit every minute means less than a minute time to warm the > >> caches. > >> > > >> > To give you a idea of what kind of queries needs to be autowarmed in > my > >> app, > >> > the logevents indexed as documents have timestamps with different > >> > granularity used for faceting. > >> > For example, to get count of logevents for every hour using faceting > >> there's > >> > a timestamp field with the format yyyymmddhh ( for example: 2010021808 > >> > meaning 2010-02-18 8am). > >> > One use case is to get hourly counts over the whole index. A > non-cached > >> > query counting the hourly counts over the 40M documents index takes a > >> > while.. > >> > And to my understanding autowarming means something like that this > kind > >> of > >> > query would be basically re-executed against a cold cache. Probably > not > >> > exactly how it works, but it "feels" like it would. > >> > > >> > Moving the commits to a smaller index while using sharding to have a > >> > transparent view to the index from the client app seems to solve my > >> problem. > >> > > >> > I'm not sure if the (upcoming?) NRT features would keep the caches > more > >> > persistent, probably not in a environment where docs get frequent > updates > >> / > >> > deletes. > >> > > >> > Also, I'm closely following the Ocean Realtime Search project AND it's > >> SOLR > >> > integration. It sounds like it has the "dream features" to enable > >> realtime > >> > updates to the index. > >> > > >> > -Janne > >> > > >> > > >> > 2010/2/18 Jan Høydahl / Cominvent > >> > > >> > > Hi, > >> > > > >> > > Have you tried playing with mergeFactor or even mergePolicy? > >> > > > >> > > -- > >> > > Jan Høydahl - search architect > >> > > Cominvent AS - www.cominvent.com > >> > > > >> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote: > >> > > > >> > > > Hey Dipti, > >> > > > > >> > > > Basically query optimizations + setting cache sizes to a very high > >> level. > >> > > > Other than that, the config is about the same as the > out-of-the-box > >> > > config > >> > > > that comes with the Solr download. > >> > > > > >> > > > I haven't found a magic switch to get very fast query responses + > >> facet > >> > > > counts with the frequency of commits I'm having using one single > SOLR > >> > > > instance. > >> > > > Adding some TOP queries for a certain type of user to static > warming > >> > > queries > >> > > > just moved the time of autowarming the caches to the time it took > to > >> warm > >> > > > the caches with static queries. > >> > > > I've been staging a setup where there's a small solr instance > >> receiving > >> > > all > >> > > > the updates and a large instance which doesn't receive the live > feed > >> of > >> > > > updates. > >> > > > The small index will be merged with the large index periodically > >> (once a > >> > > > week or once a month). > >> > > > The two instances are seen by the client app as one instance using > >> the > >> > > > sharding features of SOLR. > >> > > > The instances are running on the same server inside their own JVM > / > >> > > jetty. > >> > > > > >> > > > In this setup the caches are very HOT for the large index and > queries > >> are > >> > > > extremely fast, and the small index is small enough to get > extremely > >> fast > >> > > > queries without having to warm up the caches too much. > >> > > > > >> > > > Basically I'm able to have a commit frequency of 10 seconds in a > 40M > >> docs > >> > > > index while counting TOP5 facets over 14 fields in 200ms. > >> > > > In reality the commit frequency of 10 seconds comes from the fact > >> that > >> > > the > >> > > > updates are going into a 1M - 2M documents index, and the fast > facet > >> > > counts > >> > > > from the fact that the 38M documents index has hot caches and > doesn't > >> > > > receive any updates. > >> > > > > >> > > > Also, not running updates to the large index means that the SOLR > >> instance > >> > > > reading the large index uses about half the memory it used before > >> when > >> > > > running the updates to the large index. At least it does so on > >> Win2k3. > >> > > > > >> > > > -Janne > >> > > > > >> > > > > >> > > > 2010/2/15 dipti khullar > >> > > > > >> > > >> Hey Janne > >> > > >> > >> > > >> Can you please let me know what other optimizations are you > talking > >> > > about > >> > > >> here. Because in our application we are committing in about 5 > mins > >> but > >> > > >> still > >> > > >> the response time is very low and at times there are some > connection > >> > > time > >> > > >> outs also. > >> > > >> > >> > > >> Just wanted to confirm if you have done some major configuration > >> changes > >> > > >> which have proved beneficial. > >> > > >> > >> > > >> Thanks > >> > > >> Dipti > >> > > >> > >> > > >> > >> > > > >> > > > >> > >> > > >