Thanks to both of you for your responses - Otis and Chris. We did manage to run some benchmarks, but we think there are some surprising results here. It seems that caching is not affecting performance that much. Is that because of the small index size?

Do these seem ok or is there any room for improvement in anyway that you could think of?

Regards,
Ajanta.

Results from development servers
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Resultsfromdevelopmentservers>Solr HTTP Interface Configurations <https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Configurations>

   * Index size is approx 500M (a little more)
   * Tomcat 6.0
   * Solr (nightly build dated 2007-04-19)
   * Nginx v0.5.20 is used as load balancer (very light weight in size,
     functionality and cpu consumption) with round-robin distribution
     of requests.
   * Grinder v3.0-beta33 was used for testing. This allows one to write
     custom scripts (in jython) and has nice GUI interface for
     presenting results.
   * Server Config : IntelĀ® Xeon^(TM) 3040 1.87Ghz 1066MHz, 4GB RAM
     (system boot usage 300MB), 8GB swap
   * Querylist was custom build from web with some of them having
     AND/OR between terms. territory field was always US.

Benchmarks
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Benchmarks> Threads Servers Total queries/ Unique Queries Caching Performance (queries/sec)
25      2       2500/1950       D*      500
25      2       2500/2500       D       142
40      2       4000/4000       D       100
40      2       4000/3000       D       166
40      3       4000/4000       D       133
40(backtoback)  3       4000/4000       D       333
40      3       4000/3300       D       142
10      3       2000/2000       D       434
40      3       4000/4000       Q.Caching: 1024         158
40(backtoback)  3       4000/4000       Q.Caching: 1024         384


Without US territory
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#WithoutUSterritory> Threads Servers Total queries/ Unique Queries Caching Performance (queries/sec)
40      3       4000/4000       D       142
40      2       4000/4000       D       100


Moving territory:US from query to Filters
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Movingterritory:USfromquerytoFilters> Threads Servers Total queries/ Unique Queries Caching Performance (queries/sec)
40      3       4000/4000       F.Caching :16384        133
40      3       4000/3400       F.Caching :16384        147

   * D implies caching was disabled
   * *backtoback* implies same code was run again
   * CPU usage when server was processing query was ~40-50%
   * Tomcat shows 3% memory usage.



Otis Gospodnetic wrote:
Hi Ajanta,

I think you answered your own questions.  Either use Filters or partition the 
index.  The advantage of partitioning is that you can update them separately 
without affecting filters, cache, searcher, etc. for the other indices (i.e. no 
need to warm up with data from the other indices).  If you are indeed working 
with the high QPS, partitioning also lets you scale indices separately (are all 
territories the same size document-wise?  do they all get the same QPS?).  The 
disadvantage is that you can't easily run queries that don't depend on a 
territory.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lucene Consulting -- http://lucene-consulting.com/


----- Original Message ----
From: Ajanta <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 15, 2007 11:35:13 AM
Subject: system architecture question when using solr/lucene



We are currently looking at large numbers of queries/sec and would like to
optimize that as much as possible. The special need is that we would like to
show specific results based on a specific field - territory field and
depending on where in the world you're coming from we'd like to show you
specific results. The  index is very large (currently 2 million rows) and
could grow even larger (2-3 times) in the future. How do we accomplish this
given that we have some domain knowledge (the territory) to use to our
advantage? Is there a way we can hint solr/lucene to use this information to
provide better results? We could use filters on territory or we could use
different indexes for different territories (individually or in a
combination.)  Are there any other ways to do this? How do we figure out the
best case in this situation?


Reply via email to