If you are updating all the time, don't forceMerge at all, unless you want to 
put the overhead of big merges at a known time. Otherwise, leave it alone.

wunder

On Oct 12, 2012, at 3:56 PM, Erick Erickson wrote:

> Right. If I've multiplied right, you're essentially replacing your entire 
> index
> every day given the rate you're adding documents.
> 
> Have a look at MergePolicy, here are a couple of references:
> http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
> https://lucene.apache.org/core/old_versioned_docs/versions/3_2_0/api/core/org/apache/lucene/index/MergePolicy.html
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> 
> But unless you're having problems with performance, I'd consider just
> optimizing once a day at off-peak hours.
> 
> FWIW,
> Erick
> 
> On Fri, Oct 12, 2012 at 5:35 PM, Petersen, Robert <rober...@buy.com> wrote:
>> Hi Erick,
>> 
>> After reading the discussion you guys were having about renaming optimize to 
>> forceMerge I realized I was guilty of over-optimizing like you guys were 
>> worried about!  We have about 15 million docs indexed now and we spin about 
>> 50-300 adds per second 24/7, most of them being updates to existing 
>> documents whose data has changed since the last time it was indexed (which 
>> we keep track of in a DB table).  There are some new documents being added 
>> in the mix and some deletes as well too.
>> 
>> I understand now how the merge policy caps the number of segments.  I used 
>> to think they would grow unbounded and thus optimize was required.  How does 
>> the large number of updates of existing documents affect the need to 
>> optimize, by causing a large number of deletes with a 're-add'?  And so I 
>> suppose that means the index size tends to grow with the deleted docs 
>> hanging around in the background, as it were.
>> 
>> So in our situation, what frequency of optimize would you recommend?  We're 
>> on 3.6.1 btw...
>> 
>> Thanks,
>> Robi
>> 
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Thursday, October 11, 2012 5:29 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: anyone have any clues about this exception
>> 
>> Well, you'll actually be able to optimize, it's just called forceMerge.
>> 
>> But the point is that optimize seems like something that _of course_ you 
>> want to do, when in reality it's not something you usually should do at all. 
>> Optimize does two things:
>> 1> merges all the segments into one (usually)
>> 2> removes all of the info associated with deleted documents.
>> 
>> Of the two, point <2> is the one that really counts and that's done whenever 
>> segment merging is done anyway. So unless you have a very large number of 
>> deletes (or updates of the same document), optimize buys you very little. 
>> You can tell this by the difference between numDocs and maxDoc in the admin 
>> page.
>> 
>> So what happens if you just don't bother to optimize? Take a look at merge 
>> policy to help control how merging happens perhaps as an alternative.
>> 
>> Best
>> Erick
>> 
>> On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert <rober...@buy.com> wrote:
>>> You could be right.  Going back in the logs, I noticed it used to happen 
>>> less frequently and always towards the end of an optimize operation.  It is 
>>> probably my indexer timing out waiting for updates to occur during 
>>> optimizes.  The errors grew recently due to my upping the indexer 
>>> threadcount to 22 threads, so there's a lot more timeouts occurring now.  
>>> Also our index has grown to double the old size so the optimize operation 
>>> has started taking a lot longer, also contributing to what I'm seeing.   I 
>>> have just changed my optimize frequency from three times a day to one time 
>>> a day after reading the following:
>>> 
>>> Here they are talking about completely deprecating the optimize
>>> command in the next version of solr...
>>> https://issues.apache.org/jira/browse/SOLR-3141c
>>> 
>>> 
>>> -----Original Message-----
>>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>>> Sent: Wednesday, October 10, 2012 11:10 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: anyone have any clues about this exception
>>> 
>>> Something timed out, the other end closed the connection. This end tried to 
>>> write to closed pipe and died, something tried to catch that exception and 
>>> write its own and died even worse? Just making it up really, but sounds 
>>> good (plus a 3-year Java tech-support hunch).
>>> 
>>> If it happens often enough, see if you can run WireShark on that machine's 
>>> network interface and catch the whole network conversation in action. 
>>> Often, there is enough clues there by looking at tcp packets and/or stuff 
>>> transmitted. WireShark is a power-tool, so takes a little while the first 
>>> time, but the learning will pay for itself over and over again.
>>> 
>>> Regards,
>>>   Alex.
>>> 
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all
>>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>> book)
>>> 
>>> 
>>> On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert <rober...@buy.com> wrote:
>>>> Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
>>>> instance contains lots of these exceptions but solr itself seems to be 
>>>> doing fine... any ideas?  I'm not seeing these exceptions being logged on 
>>>> my slave servers btw, just the master where we do our indexing only.
>>>> 
>>>> 
>>>> 
>>>> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve
>>>> invoke
>>>> SEVERE: Servlet.service() for servlet default threw exception
>>>> java.lang.IllegalStateException
>>>>                at 
>>>> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
>>>>                at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
>>>>                at 
>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
>>>>                at 
>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>>                at 
>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>>                at 
>>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>>                at 
>>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>>                at 
>>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>>                at 
>>>> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
>>>>                at 
>>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>>                at 
>>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>>                at 
>>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>>>>                at 
>>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>>>>                at 
>>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>>>                at 
>>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
>>>>                at java.lang.Thread.run(Unknown Source)
>>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to