On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozde...@gmail.com> wrote: > Thanks Robert for these inputs. > > Since we do not really Snowball analyzer for this field, we would not use > it for now. If this still does not address our issue, we would tweak thread > pool as per eks dev suggestion - I am bit hesitant to do this change yet as > we would be reducing thread pool which can adversely impact our throughput > > If Snowball Filter is being optimized for Solr 4 beta then it would be > great for us. If you have already filed a JIRA for this then please let me > know and I would like to follow it
AFAIK Robert already created and issue here: https://issues.apache.org/jira/browse/LUCENE-4279 and it seems fixed. Given the massive commit last night its already committed and backported so it will be in 4.0-BETA. simon > > Thanks again > Saroj > > > > > > On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rcm...@gmail.com> wrote: > >> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <rozde...@gmail.com> wrote: >> > Hi All >> > >> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing >> that >> > when we are indexing lots of data with 16 concurrent threads, Heap grows >> > continuously. It remains high and ultimately most of the stuff ends up >> > being moved to Old Gen. Eventually, Old Gen also fills up and we start >> > getting into excessive GC problem. >> >> Hi: I don't claim to know anything about how tomcat manages threads, >> but really you shouldnt have all these objects. >> >> In general snowball stemmers should be reused per-thread-per-field. >> But if you have a lot of fields*threads, especially if there really is >> high thread churn on tomcat, then this could be bad with snowball: >> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 >> >> I think it would be useful to see if you can tune tomcat's threadpool >> as he describes. >> >> separately: Snowball stemmers are currently really ram-expensive for >> stupid reasons. >> each one creates a ton of Among objects, e.g. an EnglishStemmer today >> is about 8KB. >> >> I'll regenerate these and open a JIRA issue: as the snowball code >> generator in their svn was improved >> recently and each one now takes about 64 bytes instead (the Among's >> are static and reused). >> >> Still this wont really "solve your problem", because the analysis >> chain could have other heavy parts >> in initialization, but it seems good to fix. >> >> As a workaround until then you can also just use the "good old >> PorterStemmer" (PorterStemFilterFactory in solr). >> Its not exactly the same as using Snowball(English) but its pretty >> close and also much faster. >> >> -- >> lucidimagination.com >>