Thanks so much Shawn, I am in a scenario with many inserts while searching, each <add> consisting of ~ 500documents, I will monitor the number of segments taking your considerations in mind :-) Regards, Tommaso
2010/11/4 Shawn Heisey <s...@elyograg.org> > On 11/4/2010 3:27 AM, Tommaso Teofili wrote: > >> - Is mergeFactor a one time configuration setting that is considered >> only >> >> when creating the index for the first time or can it be adjusted later >> even >> with some docs inside the index? e.g. I have mF to 10 then I realize I >> want >> quicker searches and I set it to 2 so that at the next optimize/commit >> I >> will have no more than 2 segments. My understanding is that one can >> adjust >> mF over time, is it right? >> > > The mergeFactor is applied anytime documents are added to the index, not > just when it is built for the first time. You can adjust it later, and > reload the core or restart Solr. It will apply to any additional indexing > from that point forward. > > With a mergeFactor of 10, having 21 segments (and more) temporarily on the > disk at the same time is reasonably possible. I know this applies if you > are doing a continuous large insert, not sure if you are doing several small > inserts separately. These segments are: > > * The small segment that is being built right now. > * The previous 10 small segments. > * The merged segment being created from those above. > * The previous 9 merged segments. > > If it takes a really long time to merge the last 10 small segments and then > merge the 10 large segments into an even larger segment, you can end up with > even more small segments from your continuous insert. If it should take > long enough that you actually get 10 more new small segments, the large > merge will pause while it completes the small merge. I saw this happen > recently when I decided to see what happens if I built a single shard from > our entire database. It took a really long time, partly from that > super-merge and the optimize that happened later, and took up 85GB of disk > space. > > I'm not really sure what happens if you have this continue beyond a single > super-merge like I have mentioned. > > - In a replicated environment does it make sense to define different >> >> mergeFactors on master and slave? I'd say no since it influences the >> number >> of segments created, that being a concern of who actually index >> documents >> (the master) not of who receives (segments of) index, but please >> correct me >> if I am wrong. >> > > Because it only applies when indexes are being built, it has no meaning on > a slave, which as you said, just copies the data from the master. > > Shawn > >