Realtime search and facets with very frequent commits

Janne Majaranta Thu, 11 Feb 2010 09:35:49 -0800

Hello,

I have a log search like application which requires indexed log events to be
searchable within a minute
and uses facets and the statscomponent.


Some stats:
- The log events are indexed every 10 seconds with a "commitWithin" of 60
seconds.
- 1M events / day (~75% are updates to previous events).
- Faceting over 14 fields ( strings ). Usually TOP5 by numdocs but facets
for all 14 fields at the same time.
- Heavy use of StatsComponent ( stats over facets of ~36M documents ).


The application is running a single Solr instance. All updates and queries
are sent to the same instance.
Faceting and the StatsComponent are both amazingly fast with that amount of
documents *when* the caches are warm.

The problem I'm now facing is that keeping the caches warm is too heavy
compared to the frequency of updates.
It takes over 60 seconds to warmup the caches to the level where facets and
stats are returned in milliseconds.

I have tested putting a second solr instance on the same server and sending
the updates to that new instance.
Warming up the new small instance is very fast while the large instance has
very hot caches.

I also put a third (empty) solr instance on the same server which passes the
queries to the two instances with the
"shards" parameters. This is mainly because the client app really doesn't
have to know anything about the shards.

The setup was easy to configure and responses are back in milliseconds and
the updates are visible in seconds.
That is, responses in milliseconds over 40M documents and a update frequency
of 15 seconds on a single physical server.
The (lab) server has 16g RAM and it is running win23k.

Also, what I found out is that using the sharded setup I only need half the
memory for the large instance.
When indexing to the large instance the memory usage goes very fast up to
the maximum allocated heap size and never goes down.

My question is, is there a magic switch in SOLR to have that kind of update
frequency while having the caches on fire ?
Or is it just impossible to achieve facet counts and queries in milliseconds
while updating the index every minute ?

The second question is, the setup with a empty SOLR as a "coordinating"
instance, a large SOLR instance with hot caches and a small SOLR instance
with immediate updates,
all on the same physical server, does it sound like a durable solution
(until the small instance gets big) or is it something is braindead ?

And the third question is, would it be a good idea to merge the small and
the large index periodically so that a fresh and empty small instance would
be available
after the merge ?

Any ideas ?

Best Regards,

Janne Majaranta

Realtime search and facets with very frequent commits

Reply via email to