You shouldn't _have_ to keep track of this yourself since Solr 4.4, see SOLR-4965 and the associated Lucene JIRA. Those are supposed to make issuing a commit on an index that hasn't changed a no-op.
If you do issue commits and do open new searchers when the index has NOT changed, it's worth a JIRA. FWIW, Erick On Wed, Jan 7, 2015 at 1:17 PM, Peter Sturge <peter.stu...@gmail.com> wrote: >> Is there a problem with multi-valued fields and distributed queries? > >> No. But there are some components that don't do the right thing in >> distributed mode, joins for instance. The list is actually quite small and >> is getting smaller all the time. > > Yes, joins is the main one. There used to be some dist constraints on > grouping, but that might be from the 3.x days of field collapsing. > >> Sounds like you're doing something similar to us. In some cases we have a >> hard commit every minute. Keeping the caches hot seems like a very good >> reason to send data to a specific shard. At least I'm assuming that when > you >> add documents to a single shard and commit; the other shards won't be >> impacted... > >> Not true if the other shards have had any indexing activity. The commit is >> usually forwarded to all shards. If the individual index on a >> particular shard is >> unchanged then it should be a no-op though. > > This is an excellent point, and well worth taking some care on. > We do it by indexing to a number of shards, and only commit to those that > actually have something to commit - although an empty commit might be a > no-op on the indexing side, it's not on the automwarming/faceting side - > care needs to be taken so that you don't hose your caches unnecessarily. > > > On Wed, Jan 7, 2015 at 4:42 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> See below: >> >> >> On Wed, Jan 7, 2015 at 1:25 AM, Bram Van Dam <bram.van...@intix.eu> wrote: >> > On 01/06/2015 07:54 PM, Erick Erickson wrote: >> >> >> >> Have you considered pre-supposing SolrCloud and using the SPLITSHARD >> >> API command? >> > >> > >> > I think that's the direction we'll probably be going. Index size (at >> least >> > for us) can be unpredictable in some cases. Some clients start out small >> and >> > then grow exponentially, while others start big and then don't grow much >> at >> > all. Starting with SolrCloud would at least give us that flexibility. >> > >> > That being said, SPLITSHARD doesn't seem ideal. If a shard reaches a >> certain >> > size, it would be better for us to simply add an extra shard, without >> > splitting. >> > >> >> True, and you can do this if you take explicit control of the document >> routing, but... >> that's quite tricky. You forever after have to send any _updates_ to the >> same >> shard you did the first time, whereas SPLITSHARD will "do the right thing". >> >> > >> >> On Tue, Jan 6, 2015 at 10:33 AM, Peter Sturge <peter.stu...@gmail.com> >> >> wrote: >> >>> >> >>> ++1 for the automagic shard creator. We've been looking into doing this >> >>> sort of thing internally - i.e. when a shard reaches a certain size/num >> >>> docs, it creates 'sub-shards' to which new commits are sent and queries >> >>> to >> >>> the 'parent' shard are included. The concept works, as long as you >> don't >> >>> try any non-dist stuff - it's one reason why all our fields are always >> >>> single valued. >> > >> > >> > Is there a problem with multi-valued fields and distributed queries? >> >> No. But there are some components that don't do the right thing in >> distributed mode, joins for instance. The list is actually quite small and >> is getting smaller all the time. >> >> > >> >>> A cool side-effect of sub-sharding (for lack of a snappy term) is that >> >>> the >> >>> parent shard then stops suffering from auto-warming latency due to >> >>> commits >> >>> (we do a fair amount of committing). In theory, you could carry on >> >>> sub-sharding until your hardware starts gasping for air. >> > >> > >> > Sounds like you're doing something similar to us. In some cases we have a >> > hard commit every minute. Keeping the caches hot seems like a very good >> > reason to send data to a specific shard. At least I'm assuming that when >> you >> > add documents to a single shard and commit; the other shards won't be >> > impacted... >> >> Not true if the other shards have had any indexing activity. The commit is >> usually forwarded to all shards. If the individual index on a >> particular shard is >> unchanged then it should be a no-op though. >> >> But the usage pattern here is its own bit of a trap. If all your >> indexing is going >> to a single shard, then also the entire indexing _load_ is happening on >> that >> shard. So the CPU utilization will be higher on that shard than the older >> ones. >> Since distributed requests need to get a response from every shard before >> returning to the client, the response time will be bounded by the response >> from >> the slowest shard and this may actually be slower. Probably only noticeable >> when the CPU is maxed anyway though. >> >> >> >> > >> > - Bram >> > >>