Re: Solr4 cluster setup for high performance reads

Otis Gospodnetic Thu, 13 Jun 2013 19:28:01 -0700

Hi,

Changing cache sizes doesn't require indexing.
You have high IO Wait - waiting on your disks?  Ideally your index
will be cached.  Lower those cached, possibly reduce heap size, and
leave more RAM to the OS for caching and IO Wait will hopefully go
down.  I'd try with just -Xmx4g and see.


That python there - maybe you can kill the snake so it's not using the
CPU, seems to be eating a good % of it.

Oh, just looked at your query.  It's massive.  I couldn't quite see
the whole thing.  What exactly are you trying to do with such a long
query?  Maybe describe high-level goal you have....

Otis
--
Solr & ElasticSearch Support - http://sematext.com/





On Thu, Jun 13, 2013 at 9:51 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote:
> Otis,Shawn,
>
> Thanks for reply.
> You can find my schema.xml and solrconfig.xml here:
> https://gist.github.com/utkarsh2012/5778811
>
>
> To answer your questions:
>
> Those are massive caches.  Rethink their size.  More specifically,
> plug in some monitoring tool and see what you are getting out of them.
>  Just today I looked at one Sematext's client's caches - 200K entries,
> 0 evictions ==> needless waste of JVM heap.  So lower those numbers
> and increase only if you are getting evictions.
>
> Sure, I will reduce the count and see how it goes. The problem I have is,
> after such a change, I need to reindex everything again, which again is
> slow and takes time (40-60hours).
>
> &debugQuery=true output will tell you something about timings, etc.
>
> Some queries are really bad, like this one:
> http://explain.solr.pl/explains/bzy034qi
> How can this be improved? I understand that there is something horribly
> wrong here, but not sure what points to look at (Been using solr from the
> last 20 days).
>
> consider edismax and qf param instead of that field copy stuff, info
> on zee Wiki
> Related back to my last point, how can such a query be improved? Maybe
> using qf?
>
> back to monitoring - what is your bottleneck?  The query looks
> simplistic.  Is it IO? Memory? CPU?  Share some graphs and let's look.
> The query is simple, although it used edismax. I have shared an explain
> query above. Other than the query, this is my performance stats:
>
> iostat -m 5 result: http://apaste.info/hjNV
>
> top result: http://apaste.info/jlHN
>
>
> How often do you index and commit, and how many documents each time?
> This is done by datastax's dse. I assume it is configurable via
> solrconfig.xml. The updates to cassandra are daily but all the documents
> are not updated.
>
> What is your query rate?
> For the initial training, I will hit solr 1.3M times and request 2000
> documents in each query. By the current speed (just one machine), it will
> take me ~20 days to do the initial training.
>
>
> Thanks,
> -Utkarsh
>
>
>
> On Thu, Jun 13, 2013 at 6:25 PM, Shawn Heisey <s...@elyograg.org> wrote:
>
>> On 6/13/2013 5:53 PM, Utkarsh Sengar wrote:
>> > *Problems:*
>> > The initial training pulls 2000 documents from solr to find the most
>> > probable matches and calculates score (PMI/NPMI). This query is extremely
>> > slow. Also, a regular query also takes 3-4 seconds.
>> > I am running solr currently on just one VM with 12GB RAM and 8GB of Heap
>> > space is allocated to solr, the block storage is an SSD.
>>
>> Normally, I would say that you should have as much RAM as your heap size
>> plus your index size, so with your 8GB heap and 15GB index, you'd want
>> 24GB total RAM.  With SSD, that requirement should not be quite so high,
>> but you might want to try 16GB or more.  Solr works much better on bare
>> metal than it does on virtual machines.
>>
>> I suspect that what might be happening here is that your heap is just a
>> little bit too small for the combination of your index size (both
>> document count and disk space), how you use Solr, and your config, so
>> your JVM is constantly doing garbage collections.
>>
>> > What is the suggested setup for this usecase?
>> > My guess is, setting up 4 solr nodes will help, but what is the suggested
>> > RAM/heap for this kind of data?
>> > And what are the recommended configuration (solrconfig.xml) where I *need
>> > to speed up reads*?
>>
>> http://wiki.apache.org/solr/SolrPerformanceProblems
>> http://wiki.apache.org/solr/SolrPerformanceFactors
>>
>> Heap size requirements are hard to predict.  I can tell you that it's
>> highly unlikely that you will need cache sizes as large as you have
>> configured.  Start with the defaults and only increase them (by small
>> amounts) if your hitratio is not high enough.  If increasing the size
>> doesn't increase hitratio, there may be another problem.
>>
>> > Also, is there a way I can debug what is going on with solr internally?
>> As
>> > you can see, my queries are not that complex, so I don't need to debug my
>> > queries but just debug solr and see the troubled pieces in it.
>>
>> If you add &debugQuery=true to your URL, Solr will give you a lot of
>> extra information in the response.  One of the things that would be
>> important here is seeing how much time is spent in various components.
>>
>> > Also, I am new to solr, so there anything else which I missed to share
>> > which would help debug the problem?
>>
>> Sharing the entire config, schema, examples of all fields from your
>> indexed documents, and examples of your full queries would help.
>> http://apaste.info
>>
>> How often do you index and commit, and how many documents each time?
>> What is your query rate?
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Thanks,
> -Utkarsh

Re: Solr4 cluster setup for high performance reads

Reply via email to