Janne, I usually just turn the caches to next to nearly off for frequent commits.
Jason On Thu, Feb 11, 2010 at 9:35 AM, Janne Majaranta <janne.majara...@gmail.com> wrote: > Hello, > > I have a log search like application which requires indexed log events to be > searchable within a minute > and uses facets and the statscomponent. > > Some stats: > - The log events are indexed every 10 seconds with a "commitWithin" of 60 > seconds. > - 1M events / day (~75% are updates to previous events). > - Faceting over 14 fields ( strings ). Usually TOP5 by numdocs but facets > for all 14 fields at the same time. > - Heavy use of StatsComponent ( stats over facets of ~36M documents ). > > > The application is running a single Solr instance. All updates and queries > are sent to the same instance. > Faceting and the StatsComponent are both amazingly fast with that amount of > documents *when* the caches are warm. > > The problem I'm now facing is that keeping the caches warm is too heavy > compared to the frequency of updates. > It takes over 60 seconds to warmup the caches to the level where facets and > stats are returned in milliseconds. > > I have tested putting a second solr instance on the same server and sending > the updates to that new instance. > Warming up the new small instance is very fast while the large instance has > very hot caches. > > I also put a third (empty) solr instance on the same server which passes the > queries to the two instances with the > "shards" parameters. This is mainly because the client app really doesn't > have to know anything about the shards. > > The setup was easy to configure and responses are back in milliseconds and > the updates are visible in seconds. > That is, responses in milliseconds over 40M documents and a update frequency > of 15 seconds on a single physical server. > The (lab) server has 16g RAM and it is running win23k. > > Also, what I found out is that using the sharded setup I only need half the > memory for the large instance. > When indexing to the large instance the memory usage goes very fast up to > the maximum allocated heap size and never goes down. > > My question is, is there a magic switch in SOLR to have that kind of update > frequency while having the caches on fire ? > Or is it just impossible to achieve facet counts and queries in milliseconds > while updating the index every minute ? > > The second question is, the setup with a empty SOLR as a "coordinating" > instance, a large SOLR instance with hot caches and a small SOLR instance > with immediate updates, > all on the same physical server, does it sound like a durable solution > (until the small instance gets big) or is it something is braindead ? > > And the third question is, would it be a good idea to merge the small and > the large index periodically so that a fresh and empty small instance would > be available > after the merge ? > > Any ideas ? > > Best Regards, > > Janne Majaranta >