Re: Solr Pagination

Salman Ansari Sat, 10 Oct 2015 07:54:05 -0700

Regarding Solr performance issue I was facing, I upgraded my Solr machine
to have
8 cores
56 GB RAM
8 GB JVM


However, unfortunately, I am still getting delays. I have run

* the query "Football" with start=0 and rows=10 and it took around 7.329
seconds
* the query "Football" with start=1000 and rows=10 and it took around
21.994 seconds

I was looking at Solr admin that the RAM and JVM are not being utilized to
the maximum, even not half or 1/4th. How do I push data to the cache once
Solr starts? and is pushing data to cache the right strategy to solve the
issue?

Appreciate your comments.

Regards,
Salman



On Sat, Oct 10, 2015 at 11:55 AM, Salman Ansari <salman.rah...@gmail.com>
wrote:

> Thanks Shawn for your response. Based on that
> 1) Can you please direct me where I can get more information about cold
> shard vs hot shard?
>
> 2)  That 10GB number assumes there's no other software on the machine,
> like a database server or a webserver.
> Yes the machine is dedicated for Solr
>
> 3) How much index data is on the machine?
> I have 3 collections 2 for testing (so the aggregate of both of them does
> not exceed 1M document) and the main collection that I am querying now
> which contains around 69M. I have distributed all my collections into 2
> shards each with 2 replicas. The consumption on the hard disk is about 40GB.
>
> 4) A memory size of 14GB would be unusual for a physical machine, and
> makes me wonder if you're using virtual machines
> Yes I am using virtual machine as using a bare metal will be difficult in
> my case as all of our data center is on the cloud. I can increase its
> capacity though. While testing some edge cases on Solr, I realized on Solr
> admin that the memory sometimes reaches to its limit (14GB RAM, and 4GB JVM)
>
> 5) Just to confirm, I have combined the lessons from
>
> http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
> AND
> https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>
> to come up with the following settings
>
> FilterCache
>
>     <filterCache class="solr.FastLRUCache"
>                  size="16384"
>                  initialSize="4096"
>                  autowarmCount="4096"/>
>
> DocummentCahce
>
>     <documentCache class="solr.LRUCache"
>                    size="16384"
>                    initialSize="16384"
>                    autowarmCount="0"/>
>
> NewSearcher and FirsSearcher
>
> <listener event="newSearcher" class="solr.QuerySenderListener">
>       <arr name="queries">
>            <lst><str name="q">*</str><str name="sort">score desc id
> desc</str></lst>
>       </arr>
>     </listener>
>     <listener event="firstSearcher" class="solr.QuerySenderListener">
>       <arr name="queries">
> <lst> <str name="q">*</str> <str name="sort">score desc id desc</str>
> </lst>
>         <!-- seed common facets and filter queries -->
>         <lst> <str name="q">*</str>
>               <str name="facet.field">category</str>        </lst>
>       </arr>
>     </listener>
>
> Will this be using more cache in Solr and prepoupulate it?
>
> Regards,
> Salman
>
>
>
>
> On Sat, Oct 10, 2015 at 5:10 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 10/9/2015 1:39 PM, Salman Ansari wrote:
>>
>> > INFO  - 2015-10-09 18:46:17.953; [c:sabr102 s:shard1 r:core_node2
>> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
>> > [sabr102_shard1_replica1] webapp=/solr path=/select
>> > params={start=0&q=(content_text:Football)&rows=10} hits=24408 status=0
>> > QTime=3391
>>
>> Over 3 seconds for a query like this definitely sounds like there's a
>> problem.
>>
>> > INFO  - 2015-10-09 18:47:04.727; [c:sabr102 s:shard1 r:core_node2
>> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
>> > [sabr102_shard1_replica1] webapp=/solr path=/select
>> > params={start=1000&q=(content_text:Football)&rows=10} hits=24408
>> status=0
>> > QTime=21569
>>
>> Adding a start value of 1000 increases QTime by a factor of more than
>> 6?  Even more evidence of a performance problem.
>>
>> For comparison purposes, I did a couple of simple queries on a large
>> index of mine.  Here are the response headers showing the QTime value
>> and all the parameters (except my shard URLs) for each query:
>>
>>   "responseHeader": {
>>     "status": 0,
>>     "QTime": 1253,
>>     "params": {
>>       "df": "catchall",
>>       "spellcheck.maxCollationEvaluations": "2",
>>       "spellcheck.dictionary": "default",
>>       "echoParams": "all",
>>       "spellcheck.maxCollations": "5",
>>       "q.op": "AND",
>>       "shards.info": "true",
>>       "spellcheck.maxCollationTries": "2",
>>       "rows": "70",
>>       "spellcheck.extendedResults": "false",
>>       "shards": "REDACTED SEVEN SHARD URLS",
>>       "shards.tolerant": "true",
>>       "spellcheck.onlyMorePopular": "false",
>>       "facet.method": "enum",
>>       "spellcheck.count": "9",
>>       "q": "catchall:carriage",
>>       "indent": "true",
>>       "wt": "json",
>>       "_": "1444420900498"
>>     }
>>
>>
>>   "responseHeader": {
>>     "status": 0,
>>     "QTime": 176,
>>     "params": {
>>       "df": "catchall",
>>       "spellcheck.maxCollationEvaluations": "2",
>>       "spellcheck.dictionary": "default",
>>       "echoParams": "all",
>>       "spellcheck.maxCollations": "5",
>>       "q.op": "AND",
>>       "shards.info": "true",
>>       "spellcheck.maxCollationTries": "2",
>>       "rows": "70",
>>       "spellcheck.extendedResults": "false",
>>       "shards": "REDACTED SEVEN SHARD URLS",
>>       "shards.tolerant": "true",
>>       "spellcheck.onlyMorePopular": "false",
>>       "facet.method": "enum",
>>       "spellcheck.count": "9",
>>       "q": "catchall:wibble",
>>       "indent": "true",
>>       "wt": "json",
>>       "_": "1444421001024"
>>     }
>>
>> The first query had a numFound of 120906, the second a numFound of 32.
>> When I re-executed the first  query (the one with a QTime of 1253) so it
>> would use the Solr caches, QTime was 17.
>>
>> This is an index that has six cold shards with 38.8 million documents
>> each and a hot shard with 1.5 million documents.  Total document count
>> for the index is over 234 million documents, and the total size of the
>> index is about 272GB.  Each copy of the index has its shards split
>> between two servers that each have 64GB of RAM, with an 8GB max Java
>> heap.  I do not have enough memory to cache all the index contents in
>> RAM, but I can get a little less than half of it in the cache -- each
>> machine has about 56GB of cache available and contains around 135GB of
>> index data.  The index data is stored on a RAID10 array with six SATA
>> disks, so it's fairly fast, but nowhere near as fast as SSD.
>>
>> You've already mentioned the SolrPerformanceProblems wiki page that I
>> wrote, which is where I would normally send you for more information.
>> You said that your machine has 14GB of RAM and 4GB is allocated to Solr,
>> leaving about 10GB for caching.  That 10GB number assumes there's no
>> other software on the machine, like a database server or a webserver.
>> How much index data is on the machine?  You need to count all the Solr
>> cores.  If the "10GB for caching" figure is accurate, then more than
>> about 20GB of index data means you might need more memory.  If it's more
>> than about 40GB of index data, you definitely need more memory.
>>
>> A memory size of 14GB would be unusual for a physical machine, and makes
>> me wonder if you're using virtual machines.  Bare metal is always going
>> to offer better performance than a VM.  Another potential problem with
>> VMs is that the host system might have its memory oversubscribed -- the
>> total amount of memory in the host machine might be less than the total
>> amount of memory allocated to all the running virtual machines.  Solr
>> performance will be terrible if VM memory is oversubscribed.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: Solr Pagination

Reply via email to