Re: Realtime search and facets with very frequent commits

Janne Majaranta Thu, 11 Feb 2010 13:33:54 -0800

Ok,

Thanks Yonik and Otis.
I already had static warming queries with facets turned on and autowarming
at zero.
There were a lot of other optimizations after that however, so I'll try with
zero autowarming and static warming queries again.


If that doesn't work, I'll go with 3 instances on the same server.

BTW, does it sound like normal that when running updates every minute to a
36M index it takes all the available heap size after about 5 commits
although there is not a single query executed to the index and autowarming
is set to zero ? Just curious.

-Janne


2010/2/11 Otis Gospodnetic <otis_gospodne...@yahoo.com>

> Janne,
>
> The answers to your last 2 questions are both yes.  I've seen that done a
> few times and it works.  I don't have the answer to the always-hot cache
> question.
>
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> ----- Original Message ----
> > From: Janne Majaranta <janne.majara...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Thu, February 11, 2010 12:35:20 PM
> > Subject: Realtime search and facets with very frequent commits
> >
> > Hello,
> >
> > I have a log search like application which requires indexed log events to
> be
> > searchable within a minute
> > and uses facets and the statscomponent.
> >
> > Some stats:
> > - The log events are indexed every 10 seconds with a "commitWithin" of 60
> > seconds.
> > - 1M events / day (~75% are updates to previous events).
> > - Faceting over 14 fields ( strings ). Usually TOP5 by numdocs but facets
> > for all 14 fields at the same time.
> > - Heavy use of StatsComponent ( stats over facets of ~36M documents ).
> >
> >
> > The application is running a single Solr instance. All updates and
> queries
> > are sent to the same instance.
> > Faceting and the StatsComponent are both amazingly fast with that amount
> of
> > documents *when* the caches are warm.
> >
> > The problem I'm now facing is that keeping the caches warm is too heavy
> > compared to the frequency of updates.
> > It takes over 60 seconds to warmup the caches to the level where facets
> and
> > stats are returned in milliseconds.
> >
> > I have tested putting a second solr instance on the same server and
> sending
> > the updates to that new instance.
> > Warming up the new small instance is very fast while the large instance
> has
> > very hot caches.
> >
> > I also put a third (empty) solr instance on the same server which passes
> the
> > queries to the two instances with the
> > "shards" parameters. This is mainly because the client app really doesn't
> > have to know anything about the shards.
> >
> > The setup was easy to configure and responses are back in milliseconds
> and
> > the updates are visible in seconds.
> > That is, responses in milliseconds over 40M documents and a update
> frequency
> > of 15 seconds on a single physical server.
> > The (lab) server has 16g RAM and it is running win23k.
> >
> > Also, what I found out is that using the sharded setup I only need half
> the
> > memory for the large instance.
> > When indexing to the large instance the memory usage goes very fast up to
> > the maximum allocated heap size and never goes down.
> >
> > My question is, is there a magic switch in SOLR to have that kind of
> update
> > frequency while having the caches on fire ?
> > Or is it just impossible to achieve facet counts and queries in
> milliseconds
> > while updating the index every minute ?
> >
> > The second question is, the setup with a empty SOLR as a "coordinating"
> > instance, a large SOLR instance with hot caches and a small SOLR instance
> > with immediate updates,
> > all on the same physical server, does it sound like a durable solution
> > (until the small instance gets big) or is it something is braindead ?
> >
> > And the third question is, would it be a good idea to merge the small and
> > the large index periodically so that a fresh and empty small instance
> would
> > be available
> > after the merge ?
> >
> > Any ideas ?
> >
> > Best Regards,
> >
> > Janne Majaranta
>
>

Re: Realtime search and facets with very frequent commits

Reply via email to