Thanks so much for clarifying. I have deployed the change to prod and seems to be working. Some large segments were merged into 12GB segments and deleted documents were physically removed.
I am wondering about 3 other things: 1 - You mentioned that I need free disk space. Just to make sure that we are talking about disc space here. RAM can still remain at the same size? My current RAM size is Index size < RAM < 1.5 Index size 2 - When the merge is happening, it happens in disc and when it's completed, then the data is sync'ed with RAM. I am just guessing here ;-). I couldn't find a good explanation online about this. 3 - Also I am wondering what recommendation you have for continuously purging deleted documents. optimize? expungeDeletes? Natural Merge? Here are more details about the need to purge documents. My solr cluster is very expensive. So we would like to maintain the cost and avoid scaling up if possible. The solr index is being written at a rate > 100 TPS Also we have a requirement to delete old data. So we are continuously trimming millions of documents daily that are older than X years. So with the current natural merge strategy, I need to update solrconfig.xml and increase the maxMergedSegmentMB often. So that I can reclaim physical disc space. Wondering if a feature of rewriting one single large merged segment into another segment - and purging deleted documents in this process - can be useful for use cases like mine. This will help purge deleted documents without the need of continuously increasing the maxMergedSegmentMB. Thanks, Moulay On Fri, Oct 23, 2020 at 11:10 AM Erick Erickson <erickerick...@gmail.com> wrote: > Well, you mentioned that the segments you’re concerned were merged a year > ago. > If segments aren’t being merged, they’re pretty static. > > There’s no real harm in optimizing _occasionally_, even in an NRT index. > If you have > segments that were merged that long ago, you may be indexing continually > but it > sounds like it’s a situation where you update more recent docs rather than > random > ones over the entire corpus. > > That caution is more for indexes where you essentially replace docs in your > corpus randomly, and it’s really about wasting a lot of cycles rather than > bad stuff happening. When you randomly update documents (or delete them), > the extra work isn’t worth it. > > Either operation will involve a lot of CPU cycles and can require that you > have > at least as much free space on your disk as the indexes occupy, so do be > aware > of that. > > All that said, what evidence do you have that this is worth any effort at > all? > Depending on the environment, you may not even be able to measure > performance changes so this all may be irrelevant anyway. > > But to your question. Yes, you can cause regular merging to more > aggressively > merge segments with deleted docs by setting the > deletesPctAllowed > in solroconfig.xml. The default value is 33, and you can set it as low as > 20 or as > high as 50. We put > a floor of 20% because the cost starts to rise quickly if it’s lower than > that, and > expungeDeletes is a better alternative at that point. > > This is not a hard number, and in practice the percentage of you index > that consists > of deleted documents tends to be lower than this number, depending of > course > on your particular environment. > > Best, > Erick > > > On Oct 23, 2020, at 12:59 PM, Moulay Hicham <maratusa.t...@gmail.com> > wrote: > > > > Thanks Eric. > > > > My index is near real time and frequently updated. > > I checked this page > > > https://lucene.apache.org/solr/guide/8_1/uploading-data-with-index-handlers.html#xml-update-commands > > and using forceMerge/expungeDeletes are NOT recommended. > > > > So I was hoping that the change in mergePolicyFactory will affect the > > segments with high percent of deletes as part of the REGULAR segment > > merging cycles. Is my understanding correct? > > > > > > > > > > On Fri, Oct 23, 2020 at 9:47 AM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> Just go ahead and optimize/forceMerge, but do _not_ optimize to one > >> segment. Or you can expungeDeletes, that will rewrite all segments with > >> more than 10% deleted docs. As of Solr 7.5, these operations respect > the 5G > >> limit. > >> > >> See: > https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ > >> > >> Best > >> Erick > >> > >> On Fri, Oct 23, 2020, 12:36 Moulay Hicham <maratusa.t...@gmail.com> > wrote: > >> > >>> Hi, > >>> > >>> I am using solr 8.1 in production. We have about 30%-50% of deleted > >>> documents in some old segments that were merged a year ago. > >>> > >>> These segments size is about 5GB. > >>> > >>> I was wondering why these segments have a high % of deleted docs and > >> found > >>> out that they are NOT being candidates for merging because the > >>> default TieredMergePolicy maxMergedSegmentMB is 5G. > >>> > >>> So I have modified the TieredMergePolicyFactory config as below to > >>> lower the delete docs % > >>> > >>> <mergePolicyFactory > >> class="org.apache.solr.index.TieredMergePolicyFactory"> > >>> <int name="maxMergeAtOnce">10</int> > >>> <int name="segmentsPerTier">10</int> > >>> <double name="maxMergedSegmentMB">12000</double> > >>> <double name="deletesPctAllowed">20</double> > >>> </mergePolicyFactory> > >>> > >>> > >>> Do you see any issues with increasing the max merged segment to 12GB > and > >>> lowered the deletedPctAllowed to 20%? > >>> > >>> Thanks, > >>> > >>> Moulay > >>> > >> > >