Thanks to both of you for your responses - Otis and Chris. We did manage
to run some benchmarks, but we think there are some surprising results
here. It seems that caching is not affecting performance that much. Is
that because of the small index size?
Do these seem ok or is there any room for improvement in anyway that you
could think of?
Regards,
Ajanta.
Results from development servers
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Resultsfromdevelopmentservers>Solr
HTTP Interface
Configurations
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Configurations>
* Index size is approx 500M (a little more)
* Tomcat 6.0
* Solr (nightly build dated 2007-04-19)
* Nginx v0.5.20 is used as load balancer (very light weight in size,
functionality and cpu consumption) with round-robin distribution
of requests.
* Grinder v3.0-beta33 was used for testing. This allows one to write
custom scripts (in jython) and has nice GUI interface for
presenting results.
* Server Config : IntelĀ® Xeon^(TM) 3040 1.87Ghz 1066MHz, 4GB RAM
(system boot usage 300MB), 8GB swap
* Querylist was custom build from web with some of them having
AND/OR between terms. territory field was always US.
Benchmarks
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Benchmarks>
Threads Servers Total queries/ Unique Queries Caching Performance
(queries/sec)
25 2 2500/1950 D* 500
25 2 2500/2500 D 142
40 2 4000/4000 D 100
40 2 4000/3000 D 166
40 3 4000/4000 D 133
40(backtoback) 3 4000/4000 D 333
40 3 4000/3300 D 142
10 3 2000/2000 D 434
40 3 4000/4000 Q.Caching: 1024 158
40(backtoback) 3 4000/4000 Q.Caching: 1024 384
Without US territory
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#WithoutUSterritory>
Threads Servers Total queries/ Unique Queries Caching Performance
(queries/sec)
40 3 4000/4000 D 142
40 2 4000/4000 D 100
Moving territory:US from query to Filters
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Movingterritory:USfromquerytoFilters>
Threads Servers Total queries/ Unique Queries Caching Performance
(queries/sec)
40 3 4000/4000 F.Caching :16384 133
40 3 4000/3400 F.Caching :16384 147
* D implies caching was disabled
* *backtoback* implies same code was run again
* CPU usage when server was processing query was ~40-50%
* Tomcat shows 3% memory usage.
Otis Gospodnetic wrote:
Hi Ajanta,
I think you answered your own questions. Either use Filters or partition the
index. The advantage of partitioning is that you can update them separately
without affecting filters, cache, searcher, etc. for the other indices (i.e. no
need to warm up with data from the other indices). If you are indeed working
with the high QPS, partitioning also lets you scale indices separately (are all
territories the same size document-wise? do they all get the same QPS?). The
disadvantage is that you can't easily run queries that don't depend on a
territory.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Lucene Consulting -- http://lucene-consulting.com/
----- Original Message ----
From: Ajanta <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 15, 2007 11:35:13 AM
Subject: system architecture question when using solr/lucene
We are currently looking at large numbers of queries/sec and would like to
optimize that as much as possible. The special need is that we would like to
show specific results based on a specific field - territory field and
depending on where in the world you're coming from we'd like to show you
specific results. The index is very large (currently 2 million rows) and
could grow even larger (2-3 times) in the future. How do we accomplish this
given that we have some domain knowledge (the territory) to use to our
advantage? Is there a way we can hint solr/lucene to use this information to
provide better results? We could use filters on territory or we could use
different indexes for different territories (individually or in a
combination.) Are there any other ways to do this? How do we figure out the
best case in this situation?