Hi Garth, Yes, I'm straying from OP's question (I think Steve is all set). But his question, quite naturally, comes up often and a similar discussion ensues each time.
I take your point about shards and segments being different things. I understand that the hash ranges per segment are not kept in ZK. I guess I wish they were. In this regard, I liked Mongodb, uses a 2-level sharding scheme. Each shard manages a list of "chunks", each has its own hash range which is kept in the cluster state. If data needs to be balanced across nodes, it works at the chunk level. No record/doc level I/O is necessary. Much more targeted and only the data that needs to move is touched. Solr does most things better than Mongo, imo. But this is one area where the Mongo got it right. As for your example, what benefit does an application gain by reducing 10 segments, say, down to 1? Even if the index never changes? The gain _might_ be measurable, but it will be small compared to performance gains that can be had by maintaining a good data balance across nodes. Your example is based on implicit routing. So dynamic management of shards is less applicable. I just hope you get similar volumes of data every year. Otherwise, some years will perform better than others due to unbalanced data distribution! best, Charlie -----Original Message----- From: Garth Grimm [mailto:[email protected]] Sent: Monday, June 29, 2015 1:15 PM To: [email protected] Subject: RE: optimize status " Is there really a good reason to consolidate down to a single segment?" Archiving (as one example). Come July 1, the collection for log entries/transactions in June will never be changed, so optimizing is actually a good thing to do. Kind of getting away from OP's question on this, but I don't think the ability to move data between shards in SolrCloud (such as shard splitting) has much to do with the Lucene segments under the hood. I'm just guessing, but I'd think the main issue with shard splitting would be to ensure that document route ranges are handled properly, and I don't think the value used for routing has anything to do with what segment they happen to be stored into. -----Original Message----- From: Reitzel, Charles [mailto:[email protected]] Sent: Monday, June 29, 2015 11:38 AM To: [email protected] Subject: RE: optimize status Is there really a good reason to consolidate down to a single segment? Any incremental query performance benefit is tiny compared to the loss of managability. I.e. shouldn't segments _always_ be kept small enough to facilitate re-balancing data across shards? Even in non-cloud instances this is true. When a collection grows, you may want shard/split an existing index by adding a node and moving some segments around. Isn't this the direction Solr is going? With many, smaller segments, this is feasible. With "one big segment", the collection must always be reindexed. Thus, "optimize" would mean, "get rid of all deleted records" and would, in fact, optimize queries by eliminating wasted I/O. Perhaps worth it for slowly changing indexes. Seems like the Tiered merge policy is 90% there ... Or am I all wet (again)? -----Original Message----- From: Walter Underwood [mailto:[email protected]] Sent: Monday, June 29, 2015 10:39 AM To: [email protected] Subject: Re: optimize status "Optimize" is a manual full merge. Solr automatically merges segments as needed. This also expunges deleted documents. We really need to rename "optimize" to "force merge". Is there a Jira for that? wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) On Jun 29, 2015, at 5:15 AM, Steven White <[email protected]> wrote: > Hi Upayavira, > > This is news to me that we should not optimize and index. > > What about disk space saving, isn't optimization to reclaim disk space > or is Solr somehow does that? Where can I read more about this? > > I'm on Solr 5.1.0 (may switch to 5.2.1) > > Thanks > > Steve > > On Mon, Jun 29, 2015 at 4:16 AM, Upayavira <[email protected]> wrote: > >> I'm afraid I don't understand. You're saying that optimising is >> causing performance issues? >> >> Simple solution: DO NOT OPTIMIZE! >> >> Optimisation is very badly named. What it does is squashes all >> segments in your index into one segment, removing all deleted >> documents. It is good to get rid of deletes - in that sense the index >> is "optimized". >> However, future merges become very expensive. The best way to handle >> this topic is to leave it to Lucene/Solr to do it for you. Pretend >> the "optimize" option never existed. >> >> This is, of course, assuming you are using something like Solr 3.5+. >> >> Upayavira >> >> On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote: >>> >>> Have to cause of performance issues. >>> Just want to know if there is a way to tap into the status. >>> >>>> On Jun 28, 2015, at 11:37 PM, Upayavira <[email protected]> wrote: >>>> >>>> Bigger question, why are you optimizing? Since 3.6 or so, it >>>> generally hasn't been requires, even, is a bad thing. >>>> >>>> Upayavira >>>> >>>>> On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote: >>>>> Hi All, >>>>> >>>>> I have two indexers (Independent processes ) writing to a common >>>>> solr core. >>>>> If One indexer process issued an optimize on the core I want the >>>>> second indexer to wait adding docs until the optimize has >>>>> finished. >>>>> >>>>> Are there ways I can do this programmatically? >>>>> pinging the core when the optimize is happening is returning OK >> because >>>>> technically >>>>> solr allows you to update when an optimize is happening. >>>>> >>>>> any suggestions ? >>>>> >>>>> thanks, >>>>> Summer >> ************************************************************************* This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF ************************************************************************* ************************************************************************* This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *************************************************************************
