Re: Solr memory use, jmap and TermInfos/tii

2010-09-13 Thread Michael McCandless
On Mon, Sep 13, 2010 at 6:29 PM, Burton-West, Tom wrote: > Thanks Robert and everyone! > > I'm working on changing our JVM settings today, since putting Solr 1.4.1 into > production will take a bit more work and testing.  Hopefully, I'll be able to > test the setTermIndexDivisor on our test serv

RE: Solr memory use, jmap and TermInfos/tii

2010-09-13 Thread Burton-West, Tom
o see if we can provide you with our tii/tis data. I'll let you know as soon as I hear anything. Tom -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Sunday, September 12, 2010 10:48 AM To: solr-user@lucene.apache.org; simon.willna...@gmail.com Subject: Re: Solr

Re: Solr memory use, jmap and TermInfos/tii

2010-09-12 Thread Robert Muir
On Sun, Sep 12, 2010 at 9:57 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > > To change the divisor in your solrconfig, for example to 4, it looks like > > you need to do this. > > > > > class="org.apache.solr.core.StandardIndexReaderFactory"> > >4 > > > > Ah, thanks robert

Re: Solr memory use, jmap and TermInfos/tii

2010-09-12 Thread Simon Willnauer
On Sun, Sep 12, 2010 at 12:42 PM, Robert Muir wrote: > On Sat, Sep 11, 2010 at 7:51 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> On Sat, Sep 11, 2010 at 11:07 AM, Burton-West, Tom >> wrote: >> >  Is there an example of how to set up the divisor parameter in >> solrconfig.xml

Re: Solr memory use, jmap and TermInfos/tii

2010-09-12 Thread Robert Muir
On Sat, Sep 11, 2010 at 7:51 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Sat, Sep 11, 2010 at 11:07 AM, Burton-West, Tom > wrote: > > Is there an example of how to set up the divisor parameter in > solrconfig.xml somewhere? > > Alas I don't know how to configure terms index d

Re: Solr memory use, jmap and TermInfos/tii

2010-09-12 Thread Michael McCandless
One thing that the Codec API makes possible ("in theory", anyway)... is variable gap terms index. Ie, Lucene today makes an indexed term at regular (every N -- 128 in 3.x, 32 in 4.0) intervals. But this is rather silly. Imagine the terms you are going through are all singletons (happen only in o

Re: Solr memory use, jmap and TermInfos/tii

2010-09-11 Thread Simon Willnauer
On Sun, Sep 12, 2010 at 1:51 AM, Michael McCandless wrote: > On Sat, Sep 11, 2010 at 11:07 AM, Burton-West, Tom wrote: >>  Is there an example of how to set up the divisor parameter in >> solrconfig.xml somewhere? > > Alas I don't know how to configure terms index divisor from Solr... You can s

Re: Solr memory use, jmap and TermInfos/tii

2010-09-11 Thread Michael McCandless
On Sat, Sep 11, 2010 at 11:07 AM, Burton-West, Tom wrote: >  Is there an example of how to set up the divisor parameter in solrconfig.xml > somewhere? Alas I don't know how to configure terms index divisor from Solr... >>>In 4.0, w/ flex indexing, the RAM efficiency is much better -- we use lar

Re: Solr memory use, jmap and TermInfos/tii

2010-09-11 Thread Lance Norskog
There is a trick: facets with only one occurrence tend to be mispellings or dirt. You write a program to fetch the terms (Lucene's CheckIndex is a great starting point) create a stopwords file. Here's a data mining project: which languages are more vulnerable to dirty OCR? Burton-West, Tom w

RE: Solr memory use, jmap and TermInfos/tii

2010-09-11 Thread Burton-West, Tom
Thanks Mike, >>Do you use a terms index divisor? Setting that to 2 would halve the >>amount of RAM required but double (on average) the seek time to locate >>a given term (but, depending on your queries, that seek time may still >>be a negligible part of overall query time, ie the tradeoff could

Re: Solr memory use, jmap and TermInfos/tii

2010-09-11 Thread Michael McCandless
Unfortunately, the terms index (before 4.0) is not RAM efficient -- I wrote about this here: http://chbits.blogspot.com/2010/07/lucenes-ram-usage-for-searching.html Every indexed term that's loaded into RAM creates 4 objects (TermInfo, Term, String, char[]), as you see in your profiler output