Re: Facet Query performance

Shawn Heisey Mon, 08 Jul 2019 12:04:00 -0700

On 7/8/2019 12:00 PM, Midas A wrote:

Number of Docs :500000+ docs
Index Size: 300 GB
RAM: 256 GB
JVM: 32 GB

Half a million documents producing an index size of 300GB suggests*very* large documents. That typically produces an index with fieldsthat have very high cardinality, due to text tokenization.

Is Solr the only thing running on this machine, or does it have othermemory-hungry software running on it?

The screenshot described at the following URL may provide more insight.It will be important to get the sort correct. If the columns have beencustomized to show information other than the examples, it may need tobe adjusted:


https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Assuming that Solr is the only thing on the machine, then it means youhave about 224 GB of memory available to cache your index data, which isat least 300GB. Normally I would think being able to cache two thirdsof the index should be enough for good performance, but it's alwayspossible that there is something about your setup that means you don'thave enough memory.

Are you sure that you need a 32GB heap? Half a million documents shouldNOT require anywhere near that much heap.

Cardinality:
cat=44
rol=1005
ind=504
cl=2000

These cardinality values are VERY low. If you are certain about thosenumbers, it is not likely that these fields are significant contributorsto query time, either with or without docValues. How did you obtainthose numbers?


Those are not the only fields referenced in your query.  I also see these:

hemp
cEmp
pEmp
is_udis
id
is_resume
upt_date
country
exp
ctc
contents
currdesig
predesig
lng
ttl
kw_sql
kw_it

QTime:  2988 ms

Three seconds for a query with so many facets is something I wouldprobably be pretty happy to get.

Our 35% queries takes more than 10 sec.


I have no idea what this sentence means.

Please suggest the ways to improve response time . Attached queries andschema.xml and solrconfig.xml
1. Is there any other ways to rewrite queries that improve our queryperformance .?

With the information available, the only suggestion I have currently isto replace "q=*" with "q=*:*" -- assuming that the intent is to matchall documents with the main query. According to what you attached(which I am very surprised to see -- attachments usually don't make itto the list), your df parameter is "ttl" ... a field that is heavilytokenized. That means that the cardinality of the ttl field is probablyVERY high, which would make the wildcard query VERY slow.

2. can we see the DocValues cache in plugin/ stats->cache-> section onsolr UI panel ?

The admin UI only shows Solr caches. If Lucene even has a docValuescache (and I do not know whether it does), it will not be available inSolr's statistics. I am unaware of any cache in Solr for docValues.The entire point of docValues is to avoid the need to generate and cachelarge amounts of data, so I suspect there is not going to be anythingavailable in this regard.


Thanks,
Shawn

Re: Facet Query performance

Reply via email to