All, I'm having some performance issues with solr. I will give some background on our setup and implementation of solr. I'm completely open to reworking everything if the way we are currently doing things are not optimal. I'll try to be as verbose as I can in explaining all of this, but feel free to ask more questions if something doesn't make sense.
Firstly, we have three messageboards of varying traffic, totaling about 225K hits per day. Search is used maybe 500 times a day. Each board has it's two instances of solr, with Tomcat as the container, and loaded via JDNI. One instance is for topics, one instance for the posts themselves. I feel as though this may not be optimal, but I can't think of a better way to handle this. After reading the schema, maybe someone will have some better ideas. We use php to interface with solr, and we do some sorting on relevance and on the date and my thought was that could be causing solr to run out of memory. The boards are bco, vlv and wbc. I'll list the number of docs for each below along with how many added per day. bco (topics): 180,530 (~200 added daily) bco (posts): 3,961,053 (~5,000 added daily) vlv (topics): 3,817 (~200 added daily) vlv (posts): 84,005 (~7,000 added daily) wbc (topics): 29,603 (~50 added daily) wbc (posts): 739,660 (~1000 added daily) total: ~5 million total docs, with ~13.5K added per day. we add docs at :00 for bco, :20 for wbc, :40 for vlv. we feel an hour is a good enough amount of time to where results aren't lagged too much. the add process is fast, as well as the commit and i'm more than impressed with solr's ability to handle the load it does. The server hardware is 4GB memory, 1 dual-core 2GHZ opteron.. RAID 10 SATA.. the machine runs PostgreSQL, PHP and Apache. I feel that this isn't optimal either, but the costs to buy another server to separate either the solr or Postgres component is too great right now. Most of the errors I see are the jvm running out of heap space. The jvm is set to use the default for max heap size (256m I think?). I can't increase it too much, because Postgres needs as much memory as it can so the databases will still reside in memory. My first implementation of search for these sites was with pyLucene, and while that was fast, there was some sort of bug where if I added docs to the index, they wouldn't show up until I optimized the index, and that eventually just ate up too much cpu and hosed the server while it ran, which eventually started taking upwards of 2 hours of 99% cpu and that's just no good. :) When I set up solr, I had cache warming enabled and that also caused the server to choke way too soon. So I turned that off and that seemed to hold things off for awhile. I've attached the schemas and configs to this email so you can see how we have things set up. Every site is the same (config-wise) so just the names are different. It's relatively simple and I feel like the jvm shouldn't be choking so soon, but, who knows. :) One thought we had was having two instances of solr, with a board_id field and the id field as the unique id, but I wasn't sure if solr supported compound unique ids.. if not, that would make that solution moot. Hopefully this makes sense, but if not, ask me for clarification on whatever is unclear. Thanks in advance for your help and suggestions! Ian
<?xml version="1.0" ?> <schema name="bco_posts" version="1.1"> <types> <fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <fieldtype name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/> <fieldtype name="integer" class="solr.IntField" omitNorms="true"/> <fieldtype name="long" class="solr.LongField" omitNorms="true"/> <fieldtype name="float" class="solr.FloatField" omitNorms="true"/> <fieldtype name="double" class="solr.DoubleField" omitNorms="true"/> <fieldtype name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> <fieldtype name="slong" class="solr.SortableLongField" sortMissingLast="true" omitNorms="true"/> <fieldtype name="sfloat" class="solr.SortableFloatField" sortMissingLast="true" omitNorms="true"/> <fieldtype name="sdouble" class="solr.SortableDoubleField" sortMissingLast="true" omitNorms="true"/> <fieldtype name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> <fieldtype name="text_greek" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/> </fieldType> <fieldtype name="text_ws" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldtype> <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> <fieldtype name="textTight" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> </types> <field name="id" type="integer" indexed="true" stored="true"/> <field name="member_id" type="integer" indexed="true" stored="true" omitNorms="true"/> <field name="date" type="date" indexed="true" stored="false"/> <field name="body" type="text" indexed="true" stored="false"/> <field name="bodyExact" type="text_ws" indexed="true" stored="false" omitNorms="true" /> <field name="topic_id" type="integer" indexed="false" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>body</defaultSearchField> <solrQueryParser defaultOperator="OR"/> <copyField source="body" dest="bodyExact"/> </schema>
<?xml version="1.0" ?> <config> <dataDir>/opt/db/solr/bco_posts</dataDir> <indexDefaults> <useCompoundFile>false</useCompoundFile> <mergeFactor>10</mergeFactor> <maxBufferedDocs>1000</maxBufferedDocs> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>10000</maxFieldLength> <writeLockTimeout>1000</writeLockTimeout> <commitLockTimeout>10000</commitLockTimeout> </indexDefaults> <mainIndex> <!-- options specific to the main on-disk lucene index --> <useCompoundFile>false</useCompoundFile> <mergeFactor>10</mergeFactor> <maxBufferedDocs>1000</maxBufferedDocs> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>10000</maxFieldLength> <unlockOnStartup>false</unlockOnStartup> </mainIndex> <updateHandler class="solr.DirectUpdateHandler2"> <autoCommit> <maxDocs>10000</maxDocs> </autoCommit> </updateHandler> <query> <maxBooleanClauses>1024</maxBooleanClauses> <enableLazyFieldLoading>false</enableLazyFieldLoading> <useColdSearcher>false</useColdSearcher> </query> <requestHandler name="standard" class="solr.StandardRequestHandler"> <!-- default values for query parameters --> <lst name="defaults"> <str name="echoParams">explicit</str> </lst> </requestHandler> <admin> <defaultQuery>solr</defaultQuery> <gettableFiles>solrconfig.xml schema.xml admin-extra.html</gettableFiles> <pingQuery> qt=dismax&q=solr&start=3&fq=id:[* TO *]&fq=cat:[* TO *] </pingQuery> </admin> </config>