Help with tuning solr

Ian Meyer Tue, 13 Feb 2007 17:03:15 -0800

All,

I'm having some performance issues with solr. I will give some
background on our setup and implementation of solr. I'm completely
open to reworking everything if the way we are currently doing things
are not optimal. I'll try to be as verbose as I can in explaining all
of this, but feel free to ask more questions if something doesn't make
sense.


Firstly, we have three messageboards of varying traffic, totaling
about 225K hits per day. Search is
used maybe 500 times a day. Each board has it's two instances of solr,
with Tomcat as the container, and loaded via JDNI. One instance is for
topics, one instance for the posts themselves. I feel as though this
may not be optimal, but I can't think of a better way to handle this.
After reading the schema, maybe someone will have some better ideas.
We use php to interface with solr, and we do some sorting on relevance
and on the date and my thought was that could be causing solr to run
out of memory.

The boards are bco, vlv and wbc. I'll list the number of docs for each
below along with how many added per day.

bco (topics): 180,530 (~200 added daily)
bco (posts): 3,961,053 (~5,000 added daily)
vlv (topics): 3,817 (~200 added daily)
vlv (posts): 84,005 (~7,000 added daily)
wbc (topics): 29,603 (~50 added daily)
wbc (posts):  739,660 (~1000 added daily)

total: ~5 million total docs, with ~13.5K added per day.

we add docs at :00 for bco, :20 for wbc, :40 for vlv. we feel an hour
is a good enough amount of time to where results aren't lagged too
much.  the add process is fast, as well as the commit and i'm more
than impressed with solr's ability to handle the load it does.

The server hardware is 4GB memory, 1 dual-core 2GHZ opteron.. RAID 10
SATA.. the machine runs PostgreSQL, PHP and Apache. I feel that this
isn't optimal either, but the costs to buy another server to separate
either the solr or Postgres component is too great right now. Most of
the errors I see are the jvm running out of heap space. The jvm is set
to use the default for max heap size (256m I think?). I can't increase
it too much, because Postgres needs as much memory as it can so the
databases will still reside in memory.

My first implementation of search for these sites was with pyLucene,
and while that was fast, there was some sort of bug where if I added
docs to the index, they wouldn't show up until I optimized the index,
and that eventually just ate up too much cpu and hosed the server
while it ran, which eventually started taking upwards of 2 hours of
99% cpu and that's just no good. :)

When I set up solr, I had cache warming enabled and that also caused
the server to choke way too soon.  So I turned that off and that
seemed to hold things off for awhile.

I've attached the schemas and configs to this email so you can see how
we have things set up. Every site is the same (config-wise) so just
the names are different. It's relatively simple and I feel like the
jvm shouldn't be choking so soon, but, who knows. :)

One thought we had was having two instances of solr, with a board_id
field and the id field as the unique id, but I wasn't sure if solr
supported compound unique ids.. if not, that would make that solution
moot.

Hopefully this makes sense, but if not, ask me for clarification on
whatever is unclear.

Thanks in advance for your help and suggestions!
Ian

<?xml version="1.0" ?>
<schema name="bco_posts" version="1.1">
  <types>
    <fieldtype name="string" class="solr.StrField" sortMissingLast="true" 
omitNorms="true"/>
    <fieldtype name="boolean" class="solr.BoolField" sortMissingLast="true" 
omitNorms="true"/>
    <fieldtype name="integer" class="solr.IntField" omitNorms="true"/>
    <fieldtype name="long" class="solr.LongField" omitNorms="true"/>
    <fieldtype name="float" class="solr.FloatField" omitNorms="true"/>
    <fieldtype name="double" class="solr.DoubleField" omitNorms="true"/>
    <fieldtype name="sint" class="solr.SortableIntField" sortMissingLast="true" 
omitNorms="true"/>
    <fieldtype name="slong" class="solr.SortableLongField" 
sortMissingLast="true" omitNorms="true"/>
    <fieldtype name="sfloat" class="solr.SortableFloatField" 
sortMissingLast="true" omitNorms="true"/>
    <fieldtype name="sdouble" class="solr.SortableDoubleField" 
sortMissingLast="true" omitNorms="true"/>
    <fieldtype name="date" class="solr.DateField" sortMissingLast="true" 
omitNorms="true"/>

    <fieldtype name="text_greek" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
    </fieldType>
    <fieldtype name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldtype>
    <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>

    <fieldtype name="textTight" class="solr.TextField" 
positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>

 </types>

   <field name="id" type="integer" indexed="true" stored="true"/>
   <field name="member_id" type="integer" indexed="true" stored="true" 
omitNorms="true"/>
   <field name="date" type="date" indexed="true" stored="false"/>
   <field name="body" type="text" indexed="true" stored="false"/>
   <field name="bodyExact" type="text_ws" indexed="true" stored="false" 
omitNorms="true" />
   <field name="topic_id" type="integer" indexed="false" stored="true"/>

 </fields>

 <uniqueKey>id</uniqueKey>

 <defaultSearchField>body</defaultSearchField>

 <solrQueryParser defaultOperator="OR"/>

 <copyField source="body" dest="bodyExact"/>

</schema>

<?xml version="1.0" ?>
<config>
  <dataDir>/opt/db/solr/bco_posts</dataDir>
  <indexDefaults>
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <maxBufferedDocs>1000</maxBufferedDocs>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>
  </indexDefaults>

  <mainIndex>
    <!-- options specific to the main on-disk lucene index -->
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <maxBufferedDocs>1000</maxBufferedDocs>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>
    <unlockOnStartup>false</unlockOnStartup>
  </mainIndex>
  <updateHandler class="solr.DirectUpdateHandler2">
    <autoCommit> 
      <maxDocs>10000</maxDocs>
    </autoCommit>
  </updateHandler>


  <query>
    <maxBooleanClauses>1024</maxBooleanClauses>
    <enableLazyFieldLoading>false</enableLazyFieldLoading>
    <useColdSearcher>false</useColdSearcher>
  </query>

  <requestHandler name="standard" class="solr.StandardRequestHandler">
    <!-- default values for query parameters -->
     <lst name="defaults">
       <str name="echoParams">explicit</str>
     </lst>
  </requestHandler>
  <admin>
    <defaultQuery>solr</defaultQuery>
    <gettableFiles>solrconfig.xml schema.xml admin-extra.html</gettableFiles>
    <pingQuery>
     qt=dismax&amp;q=solr&amp;start=3&amp;fq=id:[* TO *]&amp;fq=cat:[* TO *]
    </pingQuery>
  </admin>
</config>

Help with tuning solr

Reply via email to