David

The main content organization I index is some number of articles existing under a common title.

I have three SOLR instances containing:

- Instance 1 - All 'live' articles ~ 750K articles - 3-4KB each
- Instance 2 - All 'live' titles' - ~ 95K titles - < 1 KB each
- Instance 3 - All articles and titles ~ 1.2mm articles + titles

I create Instance 1 and Instance 2 to provide fast response for heavy query usage on 'live' articles and 'live' titles. I use Instance 3 for all low-volume, complex queries.

(All above as preamble)

My current JVM settings are

- Instance 1  - -Xms256m -Xmx2000m
- Instance 2 - -Xms256m -Xmx1000m
- Instance 3 - -Xms256m -Xmx2000m

I'm in the middle of tuning the application. These values reflect optimization for document indexing. Haven't looked at the query side yet.

Notes

I'm using 'top' to look at process sizes (Redhat 4.x, 4 GB Xeon Dual core)

For instance 1, I could probably get away with -Xmx1000m - but I think it's just a matter of (a short) time until I need to increase that limit. For instance 2, it currently runs in steady state at 1.2 - 1.4 GB max, so I boosted to 2 GB max.

Regards,

Tracy

On May 11, 2008, at 8:31 AM, David Pratt wrote:

Hi Tracy. Can you advise the sort of difference in max heap space that resulted in the improvement, that is, your before and after max heap space. Many thanks.

Regards,
David

Tracy Flynn wrote:
Thanks for the replies.
For a completely different reason, I happened to look at the memory stats for all processes including the SOLR instances. Noticed that the SLOW Solr instance was maxing out with more virtual memory than allocated. After boosting the maximum heap space and restarting, everything started to run at 4x-5x the speed before the fix - and at the rate I reasonably thought it should.
Tracy
On May 9, 2008, at 8:02 AM, Tracy Flynn wrote:
Hi,

I'm starting to see significant slowdown in loading performance after I have loaded about 400K documents. I go from a load rate of near 40 docs/sec to 20- 25 docs a second.

Am I correct in assuming that, during indexing operations, Lucene/ SOLR tries to hold as much of the indexex in memory as possible? If so, does the slowdown indicate need to increase JVM heap space?

Any ideas / help would be appreciated

Regards,

Tracy

---------------------------------------------------------------------------------------------------------------------

Details

Documents loaded as XML via POST command in batches of 1000, commit after each batch

Total current documents ~ 450,000
Avg document size: 4KB
One indexed text field contains 3KB or so. (body field below - standard type 'text')

Dual XEON 3 GHZ 4 GB memory

SOLR JVM Startup options

java -Xms256m -Xmx1000m  -jar start.jar


Relevant portion of the schema follows


<field name="document_id" type="string" indexed="true" stored="true" required="true"/> <field name="language" type="string" indexed="true" stored="true" required="false"/> <field name="languages" type="string" indexed="true" stored="true" required="false"/> <!-- The value specified for folding_id must be a field of type "integer" -
      type "sint" does not work -->
<field name="folding_id" type="integer" indexed="true" stored="true" required="false" default="0"/> <field name="document_type" type="string" indexed="true" stored="true" required="true"/> <field name="title" type="text" indexed="true" stored="true" required="false"/> <field name="body" type="text" indexed="true" stored="true" required="false" compressed="true"/> <field name="teaser" type="text" indexed="no" stored="true" required="false"/> <field name="articles_in_category" type="sint" indexed="true" stored="true" required="false" default="0"/> <field name="pen_name" type="text" indexed="true" stored="true" required="false"/> <field name="article_id" type="sint" indexed="true" stored="true" required="false" default="0"/> <field name="article_status_id" type="sint" indexed="true" stored="true" required="false" default="0"/> <field name="user_id" type="sint" indexed="true" stored="true" required="false" default="0"/> <field name="user_name" type="text" indexed="true" stored="true" required="false"/> <field name="user_email" type="text" indexed="true" stored="true" required="false"/> <field name="channel_context" type="sint" indexed="true" stored="true" required="false" multiValued="true"/> <field name="category_id" type="sint" indexed="true" stored="true" required="false" default="0"/> <field name="category_status_id" type="sint" indexed="true" stored="true" required="false" default="0"/> <field name="category_title" type="text" indexed="true" stored="true" required="false"/> <field name="category_keywords" type="text" indexed="true" stored="true" required="false" multiValued="true"/> <field name="category_type" type="text" indexed="true" stored="true" required="false"/> <field name="channel_id" type="sint" indexed="true" stored="true" required="false" default="0"/> <field name="channel_title" type="text" indexed="true" stored="true" required="false"/> <field name="helium_rank" type="sint" indexed="false" stored="true" required="false" default="0"/> <field name="helium_rank_percentile" type="sfloat" indexed="false" stored="true" required="false"/> <field name="helium_scaled_rank_boost" type="sfloat" indexed="true" stored="true" required="false"/> <field name="helium_scaled_rank_boost_string" type="string" indexed="true" stored="true" required="false"/>
  <!--
<field name="title_popularity" type="sint" indexed="true" stored="true" default="0"/> <field name="title_recent_popularity" type="sint" indexed="true" stored="true" default="0"/> <field name="title_views_measure" type="sint" indexed="true" stored="true" default="0"/> <field name="title_recent_earnings_measure" type="sint" indexed="true" stored="true" default="0"/> <field name="title_earnings_measure" type="sint" indexed="true" stored="true" default="0"/>
 -->
<field name="created_date" type="date" indexed="true" stored="true" required="false" />




Reply via email to