Re: Realtime search and facets with very frequent commits

Otis Gospodnetic Thu, 18 Feb 2010 05:44:24 -0800

Hi Janne,

I *think*  Ocean Realtime Search has been superseded by Lucene NRT search.


 Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Janne Majaranta <janne.majara...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, February 18, 2010 2:12:37 AM
> Subject: Re: Realtime search and facets with very frequent commits
> 
> Hi,
> 
> Yes, I did play with mergeFactor.
> I didn't play with mergePolicy.
> 
> Wouldn't that affect indexing speed and possibly memory usage ?
> I don't have any problems with indexing speed ( 1000 - 2000 docs / sec via
> the standard HTTP API ).
> 
> My problem is that I need very warm caches to get fast faceting, and the
> autowarming of the caches takes too long compared to the frequency of
> commits I'm having.
> So a commit every minute means less than a minute time to warm the caches.
> 
> To give you a idea of what kind of queries needs to be autowarmed in my app,
> the logevents indexed as documents have timestamps with different
> granularity used for faceting.
> For example, to get count of logevents for every hour using faceting there's
> a timestamp field with the format yyyymmddhh ( for example: 2010021808
> meaning 2010-02-18 8am).
> One use case is to get hourly counts over the whole index. A non-cached
> query counting the hourly counts over the 40M documents index takes a
> while..
> And to my understanding autowarming means something like that this kind of
> query would be basically re-executed against a cold cache. Probably not
> exactly how it works, but it "feels" like it would.
> 
> Moving the commits to a smaller index while using sharding to have a
> transparent view to the index from the client app seems to solve my problem.
> 
> I'm not sure if the (upcoming?) NRT features would keep the caches more
> persistent, probably not in a environment where docs get frequent updates /
> deletes.
> 
> Also, I'm closely following the Ocean Realtime Search project AND it's SOLR
> integration. It sounds like it has the "dream features" to enable realtime
> updates to the index.
> 
> -Janne
> 
> 
> 2010/2/18 Jan Høydahl / Cominvent 
> 
> > Hi,
> >
> > Have you tried playing with mergeFactor or even mergePolicy?
> >
> > --
> > Jan Høydahl  - search architect
> > Cominvent AS - www.cominvent.com
> >
> > On 16. feb. 2010, at 08.26, Janne Majaranta wrote:
> >
> > > Hey Dipti,
> > >
> > > Basically query optimizations + setting cache sizes to a very high level.
> > > Other than that, the config is about the same as the out-of-the-box
> > config
> > > that comes with the Solr download.
> > >
> > > I haven't found a magic switch to get very fast query responses + facet
> > > counts with the frequency of commits I'm having using one single SOLR
> > > instance.
> > > Adding some TOP queries for a certain type of user to static warming
> > queries
> > > just moved the time of autowarming the caches to the time it took to warm
> > > the caches with static queries.
> > > I've been staging a setup where there's a small solr instance receiving
> > all
> > > the updates and a large instance which doesn't receive the live feed of
> > > updates.
> > > The small index will be merged with the large index periodically (once a
> > > week or once a month).
> > > The two instances are seen by the client app as one instance using the
> > > sharding features of SOLR.
> > > The instances are running on the same server inside their own JVM /
> > jetty.
> > >
> > > In this setup the caches are very HOT for the large index and queries are
> > > extremely fast, and the small index is small enough to get extremely fast
> > > queries without having to warm up the caches too much.
> > >
> > > Basically I'm able to have a commit frequency of 10 seconds in a 40M docs
> > > index while counting TOP5 facets over 14 fields in 200ms.
> > > In reality the commit frequency of 10 seconds comes from the fact that
> > the
> > > updates are going into a 1M - 2M documents index, and the fast facet
> > counts
> > > from the fact that the 38M documents index has hot caches and doesn't
> > > receive any updates.
> > >
> > > Also, not running updates to the large index means that the SOLR instance
> > > reading the large index uses about half the memory it used before when
> > > running the updates to the large index. At least it does so on Win2k3.
> > >
> > > -Janne
> > >
> > >
> > > 2010/2/15 dipti khullar 
> > >
> > >> Hey Janne
> > >>
> > >> Can you please let me know what other optimizations are you talking
> > about
> > >> here. Because in our application we are committing in about 5 mins but
> > >> still
> > >> the response time is very low and at times there are some connection
> > time
> > >> outs also.
> > >>
> > >> Just wanted to confirm if you have done some major configuration changes
> > >> which have proved beneficial.
> > >>
> > >> Thanks
> > >> Dipti
> > >>
> > >>
> >
> >

Re: Realtime search and facets with very frequent commits

Reply via email to