Hi Jan and Shawn, For your info, this is another debug query.
"debug":{ "rawquerystring":"johnny", "querystring":"johnny", "parsedquery":"searchFields_tcs:johnny", "parsedquery_toString":"searchFields_tcs:johnny", "explain":{ "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in 75730) [SchemaSimilarity], result of:\n 12.8497505 = score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n 7.5108404 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 473.0 = docFreq\n 865438.0 = docCount\n 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n 4.0 = termFreq=4.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 26.66791 = avgFieldLength\n 25.0 = fieldLength\n”, "QParser":"LuceneQParser", "timing":{ "time":350.0, "prepare":{ "time":0.0, "query":{ "time":0.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":0.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":0.0}}, "process":{ "time":348.0, "query":{ "time":287.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":54.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":6.0}}, "loadFieldValues":{ "time":0.0}}}} Regards, Edwin On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi Jan and Shawn, > > Please focus on the strange issue that I have described above in more > details, summary is as follows: > > 1. Index customers data, then queries from highlight, select, and all > handlers are very fast (less than 50ms) > > 2. Now index policies data, then queries on polices are very fast BUT > queries on customers becomes slow > > 3. If I reindex customers data, then again queries for customers are very > fast BUT queries on policies becomes slow. > > How can you explain this behavior? > > We have never experienced such a strange behavior before Solr 7. > > Regards, > Edwin > > On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > >> Hi Jan, >> >> Referring to what you have mentioned that the highlighting takes up most >> of the time in the first query from the policies collection, the >> highlighting was very fast (less than 50ms) from the time it was indexed, >> till the time after customers collection gets indexed, in which it slowed >> down tremendously. >> >> Also, the slow down does not just affect on the highlight requestHandler. >> It also affects other requestHandler like select and clustering as well. >> All of them gets the QTime of more than 500ms. This is also proven in the >> latest debug query that we have sent earlier, in which we have set hl=false >> and fl=null. >> >> Any idea or explanation on this strange behavior? >> Thank you for your support, as we look forward to shed some lights on >> this issue and to resolve it. >> >> Regards, >> Edwin >> >> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <edwinye...@gmail.com> >> wrote: >> >>> Hi Jan, >>> >>> Thanks for your reply. >>> >>> However, we are still getting a slow QTime of 517ms even after we set >>> hl=false&fl=null. >>> >>> Below is the debug query: >>> >>> "debug":{ >>> "rawquerystring":"cherry", >>> "querystring":"cherry", >>> "parsedquery":"searchFields_tcs:cherry", >>> "parsedquery_toString":"searchFields_tcs:cherry", >>> "explain":{ >>> "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) >>> [SchemaSimilarity], result of:\n 14.227914 = score(doc=5747763,freq=3.0 = >>> termFreq=3.0\n), product of:\n 9.614556 = idf, computed as log(1 + >>> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 400.0 = >>> docFreq\n 6000000.0 = docCount\n 1.4798305 = tfNorm, computed as >>> (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / >>> avgFieldLength)) from:\n 3.0 = termFreq=3.0\n 1.2 = parameter >>> k1\n 0.75 = parameter b\n 19.397041 = avgFieldLength\n 25.0 >>> = fieldLength\n", >>> "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) >>> [SchemaSimilarity], result of:\n 13.937909 = score(doc=4840794,freq=3.0 = >>> termFreq=3.0\n), product of:\n 9.614556 = idf, computed as log(1 + >>> (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 400.0 = >>> docFreq\n 6000000.0 = docCount\n 1.4496675 = tfNorm, computed as >>> (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / >>> avgFieldLength)) from:\n 3.0 = termFreq=3.0\n 1.2 = parameter >>> k1\n 0.75 = parameter b\n 19.397041 = avgFieldLength\n 27.0 >>> = fieldLength\n", >>> "QParser":"LuceneQParser", >>> "timing":{ >>> "time":517.0, >>> "prepare":{ >>> "time":0.0, >>> "query":{ >>> "time":0.0}, >>> "facet":{ >>> "time":0.0}, >>> "facet_module":{ >>> "time":0.0}, >>> "mlt":{ >>> "time":0.0}, >>> "highlight":{ >>> "time":0.0}, >>> "stats":{ >>> "time":0.0}, >>> "expand":{ >>> "time":0.0}, >>> "terms":{ >>> "time":0.0}, >>> "debug":{ >>> "time":0.0}}, >>> "process":{ >>> "time":516.0, >>> "query":{ >>> "time":15.0}, >>> "facet":{ >>> "time":0.0}, >>> "facet_module":{ >>> "time":0.0}, >>> "mlt":{ >>> "time":0.0}, >>> "highlight":{ >>> "time":0.0}, >>> "stats":{ >>> "time":0.0}, >>> "expand":{ >>> "time":0.0}, >>> "terms":{ >>> "time":0.0}, >>> "debug":{ >>> "time":500.0}}}}} >>> >>> Regards, >>> Edwin >>> >>> >>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <jan....@cominvent.com> wrote: >>> >>>> Looks like highlighting takes most of the time on the first query >>>> (680ms). You config seems to ask for a lot of highlighting here, like 100 >>>> snippets of max 100000 characters etc. >>>> Sounds to me that this might be a highlighting configuration problem. >>>> Try to disable highlighting (hl=false) and see if you get back your speed. >>>> Also, I see fl=* in your config, which is really asking for all fields. >>>> Are you sure you want that, that may also be slow. Try to ask for just the >>>> fields you will be using. >>>> >>>> -- >>>> Jan Høydahl, search solution architect >>>> Cominvent AS - www.cominvent.com >>>> >>>> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo < >>>> edwinye...@gmail.com>: >>>> > >>>> > Thanks for your reply. >>>> > >>>> > Below are what you have requested about our Solr setup, configurations >>>> > files ,schema and results of debug queries: >>>> > >>>> > Looking forward to your advice and support on our problem. >>>> > >>>> > 1. System configurations >>>> > OS: Windows 10 Pro 64 bit >>>> > System Memory: 32GB >>>> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical >>>> > Processor(s) >>>> > HDD: 3.0 TB (free 2.1 TB) SATA >>>> > >>>> > 2. solrconfig.xml of customers and policies collection, and solr.in >>>> ,cmd >>>> > which can be download from the following link: >>>> > >>>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing >>>> > >>>> > 3. The debug queries from both collections >>>> > >>>> > *3.1. Debug Query From Policies ( which is Slow)* >>>> > >>>> > "debug":{ >>>> > >>>> > "rawquerystring":"sherry", >>>> > >>>> > "querystring":"sherry", >>>> > >>>> > "parsedquery":"searchFields_tcs:sherry", >>>> > >>>> > "parsedquery_toString":"searchFields_tcs:sherry", >>>> > >>>> > "explain":{ >>>> > >>>> > "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in >>>> > 3097315) [SchemaSimilarity], result of:\n 14.540428 = >>>> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n >>>> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) / >>>> > (docFreq + 0.5)) from:\n 812.0 = docFreq\n 6000000.0 = >>>> > docCount\n 1.6324438 = tfNorm, computed as (freq * (k1 + 1)) / >>>> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n >>>> > 5.0 = termFreq=5.0\n 1.2 = parameter k1\n 0.75 = parameter >>>> > b\n 19.397041 = avgFieldLength\n 31.0 = fieldLength\n”,.. >>>> > >>>> > "QParser":"LuceneQParser", >>>> > >>>> > "timing":{ >>>> > >>>> > "time":681.0, >>>> > >>>> > "prepare":{ >>>> > >>>> > "time":0.0, >>>> > >>>> > "query":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "facet":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "facet_module":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "mlt":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "highlight":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "stats":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "expand":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "terms":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "debug":{ >>>> > >>>> > "time":0.0}}, >>>> > >>>> > "process":{ >>>> > >>>> > "time":680.0, >>>> > >>>> > "query":{ >>>> > >>>> > "time":19.0}, >>>> > >>>> > "facet":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "facet_module":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "mlt":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "highlight":{ >>>> > >>>> > "time":651.0}, >>>> > >>>> > "stats":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "expand":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "terms":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "debug":{ >>>> > >>>> > "time":8.0}}, >>>> > >>>> > "loadFieldValues":{ >>>> > >>>> > "time":12.0}}}} >>>> > >>>> > >>>> > >>>> > *3.2. Debug Query From Customers (which is fast because we index it >>>> after >>>> > indexing Policies):* >>>> > >>>> > >>>> > >>>> > "debug":{ >>>> > >>>> > "rawquerystring":"sherry", >>>> > >>>> > "querystring":"sherry", >>>> > >>>> > "parsedquery":"searchFields_tcs:sherry", >>>> > >>>> > "parsedquery_toString":"searchFields_tcs:sherry", >>>> > >>>> > "explain":{ >>>> > >>>> > "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in >>>> > 2453665) [SchemaSimilarity], result of:\n 13.191501 = >>>> > score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n 9.08604 >>>> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + >>>> > 0.5)) from:\n 428.0 = docFreq\n 3784142.0 = docCount\n >>>> > 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - >>>> > b + b * fieldLength / avgFieldLength)) from:\n 3.0 = >>>> > termFreq=3.0\n 1.2 = parameter k1\n 0.75 = parameter b\n >>>> > 20.22558 = avgFieldLength\n 28.0 = fieldLength\n”, .. >>>> > >>>> > "QParser":"LuceneQParser", >>>> > >>>> > "timing":{ >>>> > >>>> > "time":38.0, >>>> > >>>> > "prepare":{ >>>> > >>>> > "time":1.0, >>>> > >>>> > "query":{ >>>> > >>>> > "time":1.0}, >>>> > >>>> > "facet":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "facet_module":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "mlt":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "highlight":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "stats":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "expand":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "terms":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "debug":{ >>>> > >>>> > "time":0.0}}, >>>> > >>>> > "process":{ >>>> > >>>> > "time":36.0, >>>> > >>>> > "query":{ >>>> > >>>> > "time":1.0}, >>>> > >>>> > "facet":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "facet_module":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "mlt":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "highlight":{ >>>> > >>>> > "time":31.0}, >>>> > >>>> > "stats":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "expand":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "terms":{ >>>> > >>>> > "time":0.0}, >>>> > >>>> > "debug":{ >>>> > >>>> > "time":3.0}}, >>>> > >>>> > "loadFieldValues":{ >>>> > >>>> > "time":13.0}}}} >>>> > >>>> > >>>> > >>>> > Best Regards, >>>> > Edwin >>>> > >>>> > On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <jan....@cominvent.com> >>>> wrote: >>>> > >>>> >> It would be useful if you can disclose the machine configuration, OS, >>>> >> memory, settings etc, as well as solr config including solr.in < >>>> >> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole >>>> picture >>>> >> of memory, GC, etc. >>>> >> You could also specify debugQuery=true on a slow search and check the >>>> >> timings section for clues. What QTime are you seeing on the slow >>>> queries in >>>> >> solr.log? >>>> >> If that does not reveal the reason, I'd connect to your solr >>>> instance with >>>> >> a tool like jVisualVM or similar, to inspect what takes time. Or >>>> better, >>>> >> hook up to DataDog, SPM or some other cloud tool to get a full view >>>> of the >>>> >> system. >>>> >> >>>> >> -- >>>> >> Jan Høydahl, search solution architect >>>> >> Cominvent AS - www.cominvent.com >>>> >> >>>> >>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo < >>>> edwinye...@gmail.com >>>> >>> : >>>> >>> >>>> >>> Hi Shawn, >>>> >>> >>>> >>> Unfortunately your reply of memory may not be valid. Please refer >>>> to my >>>> >>> explanation below of the strange behaviors (is it much more like a >>>> BUG >>>> >> than >>>> >>> anything else that is explainable): >>>> >>> >>>> >>> Note that we still have 18GB of free unused memory on the server. >>>> >>> >>>> >>> 1. We indexed the first collection called customers (3.7 millioin >>>> records >>>> >>> from CSV data), index size is 2.09GB. The search in customers for >>>> any >>>> >>> keyword is returned within 50ms (QTime) for using highlight (unified >>>> >>> highlighter, posting, light term vectors) >>>> >>> >>>> >>> 2. Then we indexed the second collection called policies (6 million >>>> >> records >>>> >>> from CSV data), index size is 2.55GB. The search in policies for any >>>> >>> keyword is returned within 50ms (QTime) for using highlight (unified >>>> >>> highlighter, posting, light term vectors) >>>> >>> >>>> >>> 3. But now any search in customers for any keywords (not from cache) >>>> >> takes >>>> >>> as high as 1200ms (QTime). But still policies search remains very >>>> fast >>>> >>> (50ms). >>>> >>> >>>> >>> 4. So we decided to run the force optimize command on customers >>>> >> collection ( >>>> >>> >>>> >> >>>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false >>>> >> ), >>>> >>> surprisingly after optimization the search on customers collection >>>> for >>>> >> any >>>> >>> keywords become very fast again (less than 50ms). BUT strangely, the >>>> >> search >>>> >>> in policies collection become very slow (around 1200ms) without any >>>> >> changes >>>> >>> to the policies collection. >>>> >>> >>>> >>> 5. Based on above result, we decided to run the force optimize >>>> command on >>>> >>> policies collection ( >>>> >>> >>>> >> >>>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false >>>> >> ). >>>> >>> More surprisingly, after optimization the search on policies >>>> collection >>>> >> for >>>> >>> any keywords become very fast again (less than 50ms). BUT more >>>> strangely, >>>> >>> the search in customers collection again become very slow (around >>>> 1200ms) >>>> >>> without any changes to the customers collection. >>>> >>> >>>> >>> What a strange and unexpected behavior! If this is not a bug, how >>>> could >>>> >> you >>>> >>> explain the above very strange behavior in Solr 7.5. Could it be a >>>> bug? >>>> >>> >>>> >>> We would appreciate any support or help on our above situation. >>>> >>> >>>> >>> Thank you. >>>> >>> >>>> >>> Regards, >>>> >>> Edwin >>>> >>> >>>> >>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo < >>>> edwinye...@gmail.com> >>>> >>> wrote: >>>> >>> >>>> >>>> Hi Shawn, >>>> >>>> >>>> >>>>> If the two collections have data on the same server(s), I can see >>>> this >>>> >>>>> happening. More memory is consumed when there is additional >>>> data, and >>>> >>>>> when Solr needs more memory, performance might be affected. The >>>> >>>>> solution is generally to install more memory in the server. >>>> >>>> >>>> >>>> I have found that even after we delete the index in collection2, >>>> the >>>> >> query >>>> >>>> QTime for collection1 still remains slow. It does not goes back to >>>> its >>>> >>>> previous fast speed before we index collection2. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Edwin >>>> >>>> >>>> >>>> >>>> >>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo < >>>> edwinye...@gmail.com >>>> >>> >>>> >>>> wrote: >>>> >>>> >>>> >>>>> Hi Shawn, >>>> >>>>> >>>> >>>>> Thanks for your reply. >>>> >>>>> >>>> >>>>> The log only shows a list the following and I don't see any >>>> other logs >>>> >>>>> besides these. >>>> >>>>> >>>> >>>>> 2019-01-24 02:47:57.925 INFO (qtp2131952342-1330) [c:collectioin1 >>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] >>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory >>>> >> update-script#processAdd: >>>> >>>>> id=13245417 >>>> >>>>> 2019-01-24 02:47:57.957 INFO (qtp2131952342-1330) [c:collectioin1 >>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] >>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory >>>> >> update-script#processAdd: >>>> >>>>> id=13245430 >>>> >>>>> 2019-01-24 02:47:57.957 INFO (qtp2131952342-1330) [c:collectioin1 >>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] >>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory >>>> >> update-script#processAdd: >>>> >>>>> id=13245435 >>>> >>>>> >>>> >>>>> There is no change to the segments info. but the slowdown in the >>>> first >>>> >>>>> collection is very drastic. >>>> >>>>> Before the indexing of collection2, the collection1 query QTime >>>> are in >>>> >>>>> the range of 4 to 50 ms. However, after indexing collection2, the >>>> >>>>> collection1 query QTime increases to more than 1000 ms. The index >>>> are >>>> >> done >>>> >>>>> in CSV format, and the size of the index is 3GB. >>>> >>>>> >>>> >>>>> Regards, >>>> >>>>> Edwin >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <apa...@elyograg.org> >>>> >> wrote: >>>> >>>>> >>>> >>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote: >>>> >>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of >>>> when I >>>> >> am >>>> >>>>>>> indexing in collection2, the indexing affects the records in >>>> >>>>>> collection1. >>>> >>>>>>> Although the records are still intact, it seems that the >>>> settings of >>>> >>>>>> the >>>> >>>>>>> termVecotrs get wipe out, and the index size of collection1 >>>> reduced >>>> >>>>>> from >>>> >>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2. >>>> >>>>>> >>>> >>>>>> This should not be possible. Indexing in one collection should >>>> have >>>> >>>>>> absolutely no effect on another collection. >>>> >>>>>> >>>> >>>>>> If logging has been left at its default settings, the solr.log >>>> file >>>> >>>>>> should have enough info to show what actually happened. >>>> >>>>>> >>>> >>>>>>> Also, the search in >>>> >>>>>>> collection1, which was originall very fast, becomes very slow >>>> after >>>> >> the >>>> >>>>>>> indexing is done is collection2. >>>> >>>>>> >>>> >>>>>> If the two collections have data on the same server(s), I can >>>> see this >>>> >>>>>> happening. More memory is consumed when there is additional >>>> data, and >>>> >>>>>> when Solr needs more memory, performance might be affected. The >>>> >>>>>> solution is generally to install more memory in the server. If >>>> the >>>> >>>>>> system is working, there should be no need to increase the heap >>>> size >>>> >>>>>> when the memory size increases ... but there can be situations >>>> where >>>> >> the >>>> >>>>>> heap is a little bit too small, where you WOULD want to increase >>>> the >>>> >>>>>> heap size. >>>> >>>>>> >>>> >>>>>> Thanks, >>>> >>>>>> Shawn >>>> >>>>>> >>>> >>>>>> >>>> >> >>>> >> >>>> >>>>