Hi everyone, Is there any chance to get his backported for a 3.6.2 ?
Regards, Laurent 2012/8/2 Simon Willnauer <simon.willna...@gmail.com> > On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozde...@gmail.com> wrote: > > Thanks Robert for these inputs. > > > > Since we do not really Snowball analyzer for this field, we would not use > > it for now. If this still does not address our issue, we would tweak > thread > > pool as per eks dev suggestion - I am bit hesitant to do this change yet > as > > we would be reducing thread pool which can adversely impact our > throughput > > > > If Snowball Filter is being optimized for Solr 4 beta then it would be > > great for us. If you have already filed a JIRA for this then please let > me > > know and I would like to follow it > > AFAIK Robert already created and issue here: > https://issues.apache.org/jira/browse/LUCENE-4279 > and it seems fixed. Given the massive commit last night its already > committed and backported so it will be in 4.0-BETA. > > simon > > > > Thanks again > > Saroj > > > > > > > > > > > > On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rcm...@gmail.com> wrote: > > > >> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <rozde...@gmail.com> wrote: > >> > Hi All > >> > > >> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing > >> that > >> > when we are indexing lots of data with 16 concurrent threads, Heap > grows > >> > continuously. It remains high and ultimately most of the stuff ends up > >> > being moved to Old Gen. Eventually, Old Gen also fills up and we start > >> > getting into excessive GC problem. > >> > >> Hi: I don't claim to know anything about how tomcat manages threads, > >> but really you shouldnt have all these objects. > >> > >> In general snowball stemmers should be reused per-thread-per-field. > >> But if you have a lot of fields*threads, especially if there really is > >> high thread churn on tomcat, then this could be bad with snowball: > >> see eks dev's comment on > https://issues.apache.org/jira/browse/LUCENE-3841 > >> > >> I think it would be useful to see if you can tune tomcat's threadpool > >> as he describes. > >> > >> separately: Snowball stemmers are currently really ram-expensive for > >> stupid reasons. > >> each one creates a ton of Among objects, e.g. an EnglishStemmer today > >> is about 8KB. > >> > >> I'll regenerate these and open a JIRA issue: as the snowball code > >> generator in their svn was improved > >> recently and each one now takes about 64 bytes instead (the Among's > >> are static and reused). > >> > >> Still this wont really "solve your problem", because the analysis > >> chain could have other heavy parts > >> in initialization, but it seems good to fix. > >> > >> As a workaround until then you can also just use the "good old > >> PorterStemmer" (PorterStemFilterFactory in solr). > >> Its not exactly the same as using Snowball(English) but its pretty > >> close and also much faster. > >> > >> -- > >> lucidimagination.com > >> >