Hi Don, We've got a similar requirement in our environment - here's what we've found.. Every time you commit, you're doing a relatively disk I/O intensive task to insert the document(s) into the index.
For very small indexes (say, <10,000 docs), the commit time is pretty short and you can get away with doing frequent commits. With large indexes, commits can take seconds to complete, and use a fair bit of CPU & disk resource along the way. This of course impacts search performance, and it won't get your docs searchable within your 500ms requirement. The planned NRT (near real-time) feature (I believe scheduled for 1.5?) is probably what you need, where Lucene commits are done on a pre-segment basis. You could also check out the Zoie plugin, but make sure you're not also committing to disk straightaway, and that you don't mind having to reinput some data if your server crashes (Zoie uses an in-memory lookup for new doc insertions). HTH Peter On Fri, Apr 16, 2010 at 10:13 AM, Don Werve <d...@madwombat.com> wrote: > We're using Solr as the backbone for our shiny new helpdesk application, > and > by and large it's been a big win... especially in terms of search > performance. But before I pat myself on the back because the Solr devs > have > done a great job, I had a question regarding commit frequency. > > While our app doesn't need truly realtime search, documents get updated and > replaced somewhat frequently, and those changes need to be visible in the > index within 500ms. At the moment, I'm using autocommit to satisfy this, > but I've run across a few threads mentioning that frequent commits may > cause > some serious performance issues. > > Our average document size is quite small (less than 10k), and I'm expecting > that we're going to have a maximum of around 100k documents per day on any > given index; most of these will be replacing existing documents. > > So, rather than getting bitten by this down the road, I figure I may as > well > (a) ask if anybody else here is running a similar setup or has any input, > and then (b) do some heavy load testing via a fake data generator. > > Thanks-in-advance! >