how number of indexed fields effect performance

Suryansh Purwar Tue, 23 Jul 2013 12:33:48 -0700

Hi,
Thanks for your suggestions. I'll be able to provide answers to a few of
your  questions right now rest I'll answer after some time. It takes
 around 150k to 200k queries before it goes down again after restarting it.
In a typical query we are returning around 20 fields. Memory utilization
peaks only after sometime.



Regard,
Suryansh

On Tuesday, July 23, 2013, Jack Krupansky wrote:

> There was also a bug in the lazy loading of multivalued fields at one
> point recently in Solr 4.2
>
> https://issues.apache.org/**jira/browse/SOLR-4589<https://issues.apache.org/jira/browse/SOLR-4589>
> "4.x + enableLazyFieldLoading + large multivalued fields + varying fl =
> pathological CPU load & response time"
>
> Do you use multivalued fields very heavily?
>
> I'm still not ready to suggest that 1,000 fields is an okay thing to do,
> but there are still plenty of nuances in Solr performance that could
> explain the difficulties, before we even get to the 1,000 field issue
> itself.
>
> The real bottom line is that as you increase field count, there are lots
> of other aspects of Solr memory and performance degradation that increase
> as well. Some of those factors can be dealt with simply with more memory,
> more and faster CPU cores, or even more sharding, or other tuning, but not
> necessarily all of them.
>
> I think that I am already on the record on other threads as suggesting
> that "a couple hundred" is about the limit for field count for a "slam
> dunk" use of Solr. That doesn't mean you can't go above a couple hundred
> fields, just that you are in uncharted territory and may need to take
> extraordinary measures to get everything working satisfactorily. There's no
> magic hard limit, just a general sense that smaller numbers of of field are
> like "a walk in a park", while higher numbers of fields are like "chopping
> through a jungle." We each have our own threshold for... "adventure."
>
> We need answers to the previous questions we raised before we can analyze
> this a lot further.
>
> Oh, and make sure there is enough OS system memory available for caching
> of the index pages. Sometimes, it is little things like this that can crush
> Solr performance.
>
> Unfortunately, Solr is not a packaged "solution" that automatically and
> magically auto-configures everything to "work just right". Instead, it is a
> powerful toolkit that lets you do amazing things, but you the
> developer/architect need to supply amazing intelligence, wisdom, foresight,
> and insight to get it (and its hardware and software environment) to do
> those amazing things.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Alexandre Rafalovitch
> Sent: Tuesday, July 23, 2013 9:54 AM
> To: solr-user@lucene.apache.org
> Subject: Re: how number of indexed fields effect performance
>
> Do you need all of the fields loaded every time and are they stored? Maybe
> there is a document with gigantic content that you don't actually need but
> it gets deserialized anyway. Try lazy loading
> setting: enableLazyFieldLoading in solrconfig.xml
>
> Regards,
>   Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: 
> http://www.linkedin.com/in/**alexandrerafalovitch<http://www.linkedin.com/in/alexandrerafalovitch>
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Tue, Jul 23, 2013 at 12:36 AM, Jack Krupansky <j...@basetechnology.com
> >wrote:
>
>  After restarting Solr and doing a couple of queries to warm the caches,
>> are queries already slow/failing, or does it take some time and a number
>> of
>> queries before failures start occurring?
>>
>> One possibility is that you just need a lot more memory for caches for
>> this amount of data. So, maybe the failures are caused by heavy garbage
>> collections. So, after restarting Solr, check how much Java heap is
>> available, then do some warming queries, then check the Java heap
>> available
>> again.
>>
>> Add the debugQuery=true parameter to your queries and look at the timings
>> to see what phases of query processing are taking the most time. Also
>> check
>> whether the reported QTime seems to match actual wall clock time;
>> sometimes
>> formatting of the results and network transfer time can dwarf actual query
>> time.
>>
>> How many fields are you returning on a typical query?
>>
>>
>> -- Jack Krupansky
>>
>>
>> -----Original Message----- From: Suryansh Purwar
>> Sent: Monday, July 22, 2013 11:06 PM
>> To: solr-user@lucene.apache.org ; j...@basetechnology.com
>>
>> Subject: how number of indexed fields effect performance
>>
>> It was running fine initially when we just had around 100 fields
>> indexed. In this case as well it runs fine but after sometime broken pipe
>> exception starts coming which results in shard getting down.
>>
>> Regards,
>> Suryansh
>>
>>
>>
>> On Tuesday, July 23, 2013, Jack Krupansky wrote:
>>
>>  Was all of this running fine previously and only started running slow
>>
>>> recently, or is this your first measurement?
>>>
>>> Are very simple queries (single keyword, no filters or facets or sorting
>>> or anything else, and returning only a few fields) working reasonably
>>> well?
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Suryansh Purwar
>>> Sent: Monday, July 22, 2013 4:07 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: how number of indexed fields effect performance
>>>
>>> Hi,
>>>
>>> We have a two shard solrcloud cluster with each shard allocated 3
>>> separate
>>> machines. We do complex queries involving a number of filter queries
>>> coupled with group queries and faceting. All of our machines are 64 bit
>>> with 32 gb ram. Our index size is around 10gb with around 8,00,000
>>> documents. We have around 1000 indexed fields per document. 6gb of
>>> memeory
>>> is allocated to tomcat under which solr is running  on each of the six
>>> machines. We have a zookeeper ensemble consisting of 3 zookeeper
>>> instances
>>> running on 3 of the six machines with 4gb memory allocated to each of the
>>> zookeeper instance. First solr start taking too much time with "Broken
>>> pipe
>>> exception because of timeout from client side" coming again and again,
>>> then
>>> after sometime a whole shard goes down with one machine at at time
>>> followed
>>> by other machines.  Is having 1000 fields indexed with each document
>>> resulting in this problem? If it is so, what would be the ideal number of
>>> indexed fields in such environment.
>>>
>>> Regards,
>>> Suryansh
>>>
>>>
>>>
>>
>

how number of indexed fields effect performance

Reply via email to