Re: system architecture question when using solr/lucene

Yonik Seeley Mon, 21 May 2007 08:23:45 -0700

What are some typical examples of your queries (all of the params that
are sent to Solr)?
Query and Document caches typically result in small increases in performance.
The filterCache can result in large increases, depending on the queries.


Another possibility is that you may be hitting some other bottleneck,
possibly caused by synchronization... 40 threads seems kind of high
(unless they pause between requests).

-Yonik

On 5/21/07, Ajanta Phatak <[EMAIL PROTECTED]> wrote:

Thanks to both of you for your responses - Otis and Chris. We did manage
to run some benchmarks, but we think there are some surprising results
here. It seems that caching is not affecting performance that much. Is
that because of the small index size?

Do these seem ok or is there any room for improvement in anyway that you
could think of?

Regards,
Ajanta.

Results from development servers
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Resultsfromdevelopmentservers>Solr
HTTP Interface
Configurations
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Configurations>


    * Index size is approx 500M (a little more)
    * Tomcat 6.0
    * Solr (nightly build dated 2007-04-19)
    * Nginx v0.5.20 is used as load balancer (very light weight in size,
      functionality and cpu consumption) with round-robin distribution
      of requests.
    * Grinder v3.0-beta33 was used for testing. This allows one to write
      custom scripts (in jython) and has nice GUI interface for
      presenting results.
    * Server Config : Intel(r) Xeon^(TM) 3040 1.87Ghz 1066MHz, 4GB RAM
      (system boot usage 300MB), 8GB swap
    * Querylist was custom build from web with some of them having
      AND/OR between terms. territory field was always US.

Benchmarks
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Benchmarks>

Threads         Servers         Total queries/ Unique Queries   Caching         
Performance
(queries/sec)
25      2       2500/1950       D*      500
25      2       2500/2500       D       142
40      2       4000/4000       D       100
40      2       4000/3000       D       166
40      3       4000/4000       D       133
40(backtoback)  3       4000/4000       D       333
40      3       4000/3300       D       142
10      3       2000/2000       D       434
40      3       4000/4000       Q.Caching: 1024         158
40(backtoback)  3       4000/4000       Q.Caching: 1024         384


Without US territory
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#WithoutUSterritory>

Threads         Servers         Total queries/ Unique Queries   Caching         
Performance
(queries/sec)
40      3       4000/4000       D       142
40      2       4000/4000       D       100


Moving territory:US from query to Filters
<https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Movingterritory:USfromquerytoFilters>

Threads         Servers         Total queries/ Unique Queries   Caching         
Performance
(queries/sec)
40      3       4000/4000       F.Caching :16384        133
40      3       4000/3400       F.Caching :16384        147

    * D implies caching was disabled
    * *backtoback* implies same code was run again
    * CPU usage when server was processing query was ~40-50%
    * Tomcat shows 3% memory usage.



Otis Gospodnetic wrote:
> Hi Ajanta,
>
> I think you answered your own questions.  Either use Filters or partition the 
index.  The advantage of partitioning is that you can update them separately 
without affecting filters, cache, searcher, etc. for the other indices (i.e. no 
need to warm up with data from the other indices).  If you are indeed working with 
the high QPS, partitioning also lets you scale indices separately (are all 
territories the same size document-wise?  do they all get the same QPS?).  The 
disadvantage is that you can't easily run queries that don't depend on a territory.
>
> Otis
>  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Lucene Consulting -- http://lucene-consulting.com/
>
>
> ----- Original Message ----
> From: Ajanta <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 15, 2007 11:35:13 AM
> Subject: system architecture question when using solr/lucene
>
>
>
> We are currently looking at large numbers of queries/sec and would like to
> optimize that as much as possible. The special need is that we would like to
> show specific results based on a specific field - territory field and
> depending on where in the world you're coming from we'd like to show you
> specific results. The  index is very large (currently 2 million rows) and
> could grow even larger (2-3 times) in the future. How do we accomplish this
> given that we have some domain knowledge (the territory) to use to our
> advantage? Is there a way we can hint solr/lucene to use this information to
> provide better results? We could use filters on territory or we could use
> different indexes for different territories (individually or in a
> combination.)  Are there any other ways to do this? How do we figure out the
> best case in this situation?
>
>
>

Re: system architecture question when using solr/lucene

Reply via email to