Ok, thanks.

-Janne


2010/2/18 Jason Rutherglen <jason.rutherg...@gmail.com>

> Janne,
>
> I don't think there's any activity happening there.
>
> SOLR-1606 is the tracking issue for moving to per segment facets and
> docsets.  I haven't had an immediate commercial need to implement
> those.
>
> Jason
>
> On Thu, Feb 18, 2010 at 7:04 AM, Janne Majaranta
> <janne.majara...@gmail.com> wrote:
> > Hi Otis,
> >
> > Ok, now I'm confused ;)
> > There seems to be a bit activity though when looking at the "last
> updated"
> > timestamps in the google code project wiki:
> > http://code.google.com/p/oceansearch/w/list
> >
> > The Tag Index feature sounds very interesting.
> >
> > -Janne
> >
> >
> > 2010/2/18 Otis Gospodnetic <otis_gospodne...@yahoo.com>
> >
> >> Hi Janne,
> >>
> >> I *think*  Ocean Realtime Search has been superseded by Lucene NRT
> search.
> >>
> >>  Otis
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >> Hadoop ecosystem search :: http://search-hadoop.com/
> >>
> >>
> >>
> >> ----- Original Message ----
> >> > From: Janne Majaranta <janne.majara...@gmail.com>
> >> > To: solr-user@lucene.apache.org
> >> > Sent: Thu, February 18, 2010 2:12:37 AM
> >> > Subject: Re: Realtime search and facets with very frequent commits
> >> >
> >> > Hi,
> >> >
> >> > Yes, I did play with mergeFactor.
> >> > I didn't play with mergePolicy.
> >> >
> >> > Wouldn't that affect indexing speed and possibly memory usage ?
> >> > I don't have any problems with indexing speed ( 1000 - 2000 docs / sec
> >> via
> >> > the standard HTTP API ).
> >> >
> >> > My problem is that I need very warm caches to get fast faceting, and
> the
> >> > autowarming of the caches takes too long compared to the frequency of
> >> > commits I'm having.
> >> > So a commit every minute means less than a minute time to warm the
> >> caches.
> >> >
> >> > To give you a idea of what kind of queries needs to be autowarmed in
> my
> >> app,
> >> > the logevents indexed as documents have timestamps with different
> >> > granularity used for faceting.
> >> > For example, to get count of logevents for every hour using faceting
> >> there's
> >> > a timestamp field with the format yyyymmddhh ( for example: 2010021808
> >> > meaning 2010-02-18 8am).
> >> > One use case is to get hourly counts over the whole index. A
> non-cached
> >> > query counting the hourly counts over the 40M documents index takes a
> >> > while..
> >> > And to my understanding autowarming means something like that this
> kind
> >> of
> >> > query would be basically re-executed against a cold cache. Probably
> not
> >> > exactly how it works, but it "feels" like it would.
> >> >
> >> > Moving the commits to a smaller index while using sharding to have a
> >> > transparent view to the index from the client app seems to solve my
> >> problem.
> >> >
> >> > I'm not sure if the (upcoming?) NRT features would keep the caches
> more
> >> > persistent, probably not in a environment where docs get frequent
> updates
> >> /
> >> > deletes.
> >> >
> >> > Also, I'm closely following the Ocean Realtime Search project AND it's
> >> SOLR
> >> > integration. It sounds like it has the "dream features" to enable
> >> realtime
> >> > updates to the index.
> >> >
> >> > -Janne
> >> >
> >> >
> >> > 2010/2/18 Jan Høydahl / Cominvent
> >> >
> >> > > Hi,
> >> > >
> >> > > Have you tried playing with mergeFactor or even mergePolicy?
> >> > >
> >> > > --
> >> > > Jan Høydahl  - search architect
> >> > > Cominvent AS - www.cominvent.com
> >> > >
> >> > > On 16. feb. 2010, at 08.26, Janne Majaranta wrote:
> >> > >
> >> > > > Hey Dipti,
> >> > > >
> >> > > > Basically query optimizations + setting cache sizes to a very high
> >> level.
> >> > > > Other than that, the config is about the same as the
> out-of-the-box
> >> > > config
> >> > > > that comes with the Solr download.
> >> > > >
> >> > > > I haven't found a magic switch to get very fast query responses +
> >> facet
> >> > > > counts with the frequency of commits I'm having using one single
> SOLR
> >> > > > instance.
> >> > > > Adding some TOP queries for a certain type of user to static
> warming
> >> > > queries
> >> > > > just moved the time of autowarming the caches to the time it took
> to
> >> warm
> >> > > > the caches with static queries.
> >> > > > I've been staging a setup where there's a small solr instance
> >> receiving
> >> > > all
> >> > > > the updates and a large instance which doesn't receive the live
> feed
> >> of
> >> > > > updates.
> >> > > > The small index will be merged with the large index periodically
> >> (once a
> >> > > > week or once a month).
> >> > > > The two instances are seen by the client app as one instance using
> >> the
> >> > > > sharding features of SOLR.
> >> > > > The instances are running on the same server inside their own JVM
> /
> >> > > jetty.
> >> > > >
> >> > > > In this setup the caches are very HOT for the large index and
> queries
> >> are
> >> > > > extremely fast, and the small index is small enough to get
> extremely
> >> fast
> >> > > > queries without having to warm up the caches too much.
> >> > > >
> >> > > > Basically I'm able to have a commit frequency of 10 seconds in a
> 40M
> >> docs
> >> > > > index while counting TOP5 facets over 14 fields in 200ms.
> >> > > > In reality the commit frequency of 10 seconds comes from the fact
> >> that
> >> > > the
> >> > > > updates are going into a 1M - 2M documents index, and the fast
> facet
> >> > > counts
> >> > > > from the fact that the 38M documents index has hot caches and
> doesn't
> >> > > > receive any updates.
> >> > > >
> >> > > > Also, not running updates to the large index means that the SOLR
> >> instance
> >> > > > reading the large index uses about half the memory it used before
> >> when
> >> > > > running the updates to the large index. At least it does so on
> >> Win2k3.
> >> > > >
> >> > > > -Janne
> >> > > >
> >> > > >
> >> > > > 2010/2/15 dipti khullar
> >> > > >
> >> > > >> Hey Janne
> >> > > >>
> >> > > >> Can you please let me know what other optimizations are you
> talking
> >> > > about
> >> > > >> here. Because in our application we are committing in about 5
> mins
> >> but
> >> > > >> still
> >> > > >> the response time is very low and at times there are some
> connection
> >> > > time
> >> > > >> outs also.
> >> > > >>
> >> > > >> Just wanted to confirm if you have done some major configuration
> >> changes
> >> > > >> which have proved beneficial.
> >> > > >>
> >> > > >> Thanks
> >> > > >> Dipti
> >> > > >>
> >> > > >>
> >> > >
> >> > >
> >>
> >>
> >
>

Reply via email to