On Thu, Aug 2, 2012 at 7:53 AM, roz dev <rozde...@gmail.com> wrote:
> Thanks Robert for these inputs.
>
> Since we do not really Snowball analyzer for this field, we would not use
> it for now. If this still does not address our issue, we would tweak thread
> pool as per eks dev suggestion - I am bit hesitant to do this change yet as
> we would be reducing thread pool which can adversely impact our throughput
>
> If Snowball Filter is being optimized for Solr 4 beta then it would be
> great for us. If you have already filed a JIRA for this then please let me
> know and I would like to follow it

AFAIK Robert already created and issue here:
https://issues.apache.org/jira/browse/LUCENE-4279
and it seems fixed. Given the massive commit last night its already
committed and backported so it will be in 4.0-BETA.

simon
>
> Thanks again
> Saroj
>
>
>
>
>
> On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rcm...@gmail.com> wrote:
>
>> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <rozde...@gmail.com> wrote:
>> > Hi All
>> >
>> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
>> that
>> > when we are indexing lots of data with 16 concurrent threads, Heap grows
>> > continuously. It remains high and ultimately most of the stuff ends up
>> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
>> > getting into excessive GC problem.
>>
>> Hi: I don't claim to know anything about how tomcat manages threads,
>> but really you shouldnt have all these objects.
>>
>> In general snowball stemmers should be reused per-thread-per-field.
>> But if you have a lot of fields*threads, especially if there really is
>> high thread churn on tomcat, then this could be bad with snowball:
>> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>>
>> I think it would be useful to see if you can tune tomcat's threadpool
>> as he describes.
>>
>> separately: Snowball stemmers are currently really ram-expensive for
>> stupid reasons.
>> each one creates a ton of Among objects, e.g. an EnglishStemmer today
>> is about 8KB.
>>
>> I'll regenerate these and open a JIRA issue: as the snowball code
>> generator in their svn was improved
>> recently and each one now takes about 64 bytes instead (the Among's
>> are static and reused).
>>
>> Still this wont really "solve your problem", because the analysis
>> chain could have other heavy parts
>> in initialization, but it seems good to fix.
>>
>> As a workaround until then you can also just use the "good old
>> PorterStemmer" (PorterStemFilterFactory in solr).
>> Its not exactly the same as using Snowball(English) but its pretty
>> close and also much faster.
>>
>> --
>> lucidimagination.com
>>

Reply via email to