Re: system architecture question when using solr/lucene

James liu Mon, 21 May 2007 18:59:56 -0700

first u should know ur goal.

second u should analyzer ur search interface which fit for ur customer


third u analyzer ur queries(optimize solr with more used queries)

40 Threads /s does it mean u use 40 solr instances or it just show higher
user queries?


2007/5/21, Yonik Seeley <[EMAIL PROTECTED]>:


What are some typical examples of your queries (all of the params that
are sent to Solr)?
Query and Document caches typically result in small increases in
performance.
The filterCache can result in large increases, depending on the queries.

Another possibility is that you may be hitting some other bottleneck,
possibly caused by synchronization... 40 threads seems kind of high
(unless they pause between requests).

-Yonik

On 5/21/07, Ajanta Phatak <[EMAIL PROTECTED]> wrote:
> Thanks to both of you for your responses - Otis and Chris. We did manage
> to run some benchmarks, but we think there are some surprising results
> here. It seems that caching is not affecting performance that much. Is
> that because of the small index size?
>
> Do these seem ok or is there any room for improvement in anyway that you
> could think of?
>
> Regards,
> Ajanta.
>
> Results from development servers
> <
https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Resultsfromdevelopmentservers
>Solr
> HTTP Interface
> Configurations
> <
https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Configurations
>
>
>
>     * Index size is approx 500M (a little more)
>     * Tomcat 6.0
>     * Solr (nightly build dated 2007-04-19)
>     * Nginx v0.5.20 is used as load balancer (very light weight in size,
>       functionality and cpu consumption) with round-robin distribution
>       of requests.
>     * Grinder v3.0-beta33 was used for testing. This allows one to write
>       custom scripts (in jython) and has nice GUI interface for
>       presenting results.
>     * Server Config : Intel(r) Xeon^(TM) 3040 1.87Ghz 1066MHz, 4GB RAM
>       (system boot usage 300MB), 8GB swap
>     * Querylist was custom build from web with some of them having
>       AND/OR between terms. territory field was always US.
>
> Benchmarks
> <
https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Benchmarks
>
>
> Threads         Servers         Total queries/ Unique Queries
Caching         Performance
> (queries/sec)
> 25      2       2500/1950       D*      500
> 25      2       2500/2500       D       142
> 40      2       4000/4000       D       100
> 40      2       4000/3000       D       166
> 40      3       4000/4000       D       133
> 40(backtoback)  3       4000/4000       D       333
> 40      3       4000/3300       D       142
> 10      3       2000/2000       D       434
> 40      3       4000/4000       Q.Caching: 1024         158
> 40(backtoback)  3       4000/4000       Q.Caching: 1024         384
>
>
> Without US territory
> <
https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#WithoutUSterritory
>
>
> Threads         Servers         Total queries/ Unique Queries
Caching         Performance
> (queries/sec)
> 40      3       4000/4000       D       142
> 40      2       4000/4000       D       100
>
>
> Moving territory:US from query to Filters
> <
https://storesvn.limewire.com/trac/limestore/wiki/BenchmarkResults#Movingterritory:USfromquerytoFilters
>
>
> Threads         Servers         Total queries/ Unique Queries
Caching         Performance
> (queries/sec)
> 40      3       4000/4000       F.Caching :16384        133
> 40      3       4000/3400       F.Caching :16384        147
>
>     * D implies caching was disabled
>     * *backtoback* implies same code was run again
>     * CPU usage when server was processing query was ~40-50%
>     * Tomcat shows 3% memory usage.
>
>
>
> Otis Gospodnetic wrote:
> > Hi Ajanta,
> >
> > I think you answered your own questions.  Either use Filters or
partition the index.  The advantage of partitioning is that you can update
them separately without affecting filters, cache, searcher, etc. for the
other indices (i.e. no need to warm up with data from the other
indices).  If you are indeed working with the high QPS, partitioning also
lets you scale indices separately (are all territories the same size
document-wise?  do they all get the same QPS?).  The disadvantage is that
you can't easily run queries that don't depend on a territory.
> >
> > Otis
> >  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> > Lucene Consulting -- http://lucene-consulting.com/
> >
> >
> > ----- Original Message ----
> > From: Ajanta <[EMAIL PROTECTED]>
> > To: [email protected]
> > Sent: Tuesday, May 15, 2007 11:35:13 AM
> > Subject: system architecture question when using solr/lucene
> >
> >
> >
> > We are currently looking at large numbers of queries/sec and would
like to
> > optimize that as much as possible. The special need is that we would
like to
> > show specific results based on a specific field - territory field and
> > depending on where in the world you're coming from we'd like to show
you
> > specific results. The  index is very large (currently 2 million rows)
and
> > could grow even larger (2-3 times) in the future. How do we accomplish
this
> > given that we have some domain knowledge (the territory) to use to our
> > advantage? Is there a way we can hint solr/lucene to use this
information to
> > provide better results? We could use filters on territory or we could
use
> > different indexes for different territories (individually or in a
> > combination.)  Are there any other ways to do this? How do we figure
out the
> > best case in this situation?
> >
> >
> >
>




--
regards
jl

Re: system architecture question when using solr/lucene

Reply via email to