Re: Indexing in one collection affect index in another collection

Jan Høydahl Thu, 24 Jan 2019 06:43:54 -0800

Looks like highlighting takes most of the time on the first query (680ms). You 
config seems to ask for a lot of highlighting here, like 100 snippets of max 
100000 characters etc.
Sounds to me that this might be a highlighting configuration problem. Try to 
disable highlighting (hl=false) and see if you get back your speed.
Also, I see fl=* in your config, which is really asking for all fields. Are you 
sure you want that, that may also be slow. Try to ask for just the fields you 
will be using.


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com>:
> 
> Thanks for your reply.
> 
> Below are what you have requested about our Solr setup, configurations
> files ,schema and results of debug queries:
> 
> Looking forward to your advice and support on our problem.
> 
> 1. System configurations
> OS: Windows 10 Pro 64 bit
> System Memory: 32GB
> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> Processor(s)
> HDD: 3.0 TB (free 2.1 TB)  SATA
> 
> 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> which can be download from the following link:
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> 
> 3. The debug queries from both collections
> 
> *3.1. Debug Query From Policies ( which is Slow)*
> 
>  "debug":{
> 
>    "rawquerystring":"sherry",
> 
>    "querystring":"sherry",
> 
>    "parsedquery":"searchFields_tcs:sherry",
> 
>    "parsedquery_toString":"searchFields_tcs:sherry",
> 
>    "explain":{
> 
>      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
> b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
> 
>    "QParser":"LuceneQParser",
> 
>    "timing":{
> 
>      "time":681.0,
> 
>      "prepare":{
> 
>        "time":0.0,
> 
>        "query":{
> 
>          "time":0.0},
> 
>        "facet":{
> 
>          "time":0.0},
> 
>        "facet_module":{
> 
>          "time":0.0},
> 
>        "mlt":{
> 
>          "time":0.0},
> 
>        "highlight":{
> 
>          "time":0.0},
> 
>        "stats":{
> 
>          "time":0.0},
> 
>        "expand":{
> 
>          "time":0.0},
> 
>        "terms":{
> 
>          "time":0.0},
> 
>        "debug":{
> 
>          "time":0.0}},
> 
>      "process":{
> 
>        "time":680.0,
> 
>        "query":{
> 
>          "time":19.0},
> 
>        "facet":{
> 
>          "time":0.0},
> 
>        "facet_module":{
> 
>          "time":0.0},
> 
>        "mlt":{
> 
>          "time":0.0},
> 
>        "highlight":{
> 
>          "time":651.0},
> 
>        "stats":{
> 
>          "time":0.0},
> 
>        "expand":{
> 
>          "time":0.0},
> 
>        "terms":{
> 
>          "time":0.0},
> 
>        "debug":{
> 
>          "time":8.0}},
> 
>      "loadFieldValues":{
> 
>        "time":12.0}}}}
> 
> 
> 
> *3.2. Debug Query From Customers (which is fast because we index it after
> indexing Policies):*
> 
> 
> 
>  "debug":{
> 
>    "rawquerystring":"sherry",
> 
>    "querystring":"sherry",
> 
>    "parsedquery":"searchFields_tcs:sherry",
> 
>    "parsedquery_toString":"searchFields_tcs:sherry",
> 
>    "explain":{
> 
>      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
> 
>    "QParser":"LuceneQParser",
> 
>    "timing":{
> 
>      "time":38.0,
> 
>      "prepare":{
> 
>        "time":1.0,
> 
>        "query":{
> 
>          "time":1.0},
> 
>        "facet":{
> 
>          "time":0.0},
> 
>        "facet_module":{
> 
>          "time":0.0},
> 
>        "mlt":{
> 
>          "time":0.0},
> 
>        "highlight":{
> 
>          "time":0.0},
> 
>        "stats":{
> 
>          "time":0.0},
> 
>        "expand":{
> 
>          "time":0.0},
> 
>        "terms":{
> 
>          "time":0.0},
> 
>        "debug":{
> 
>          "time":0.0}},
> 
>      "process":{
> 
>        "time":36.0,
> 
>        "query":{
> 
>          "time":1.0},
> 
>        "facet":{
> 
>          "time":0.0},
> 
>        "facet_module":{
> 
>          "time":0.0},
> 
>        "mlt":{
> 
>          "time":0.0},
> 
>        "highlight":{
> 
>          "time":31.0},
> 
>        "stats":{
> 
>          "time":0.0},
> 
>        "expand":{
> 
>          "time":0.0},
> 
>        "terms":{
> 
>          "time":0.0},
> 
>        "debug":{
> 
>          "time":3.0}},
> 
>      "loadFieldValues":{
> 
>        "time":13.0}}}}
> 
> 
> 
> Best Regards,
> Edwin
> 
> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <jan....@cominvent.com> wrote:
> 
>> It would be useful if you can disclose the machine configuration, OS,
>> memory, settings etc, as well as solr config including solr.in <
>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture
>> of memory, GC, etc.
>> You could also specify debugQuery=true on a slow search and check the
>> timings section for clues. What QTime are you seeing on the slow queries in
>> solr.log?
>> If that does not reveal the reason, I'd connect to your solr instance with
>> a tool like jVisualVM or similar, to inspect what takes time. Or better,
>> hook up to DataDog, SPM or some other cloud tool to get a full view of the
>> system.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com
>>> :
>>> 
>>> Hi Shawn,
>>> 
>>> Unfortunately your reply of memory may not be valid. Please refer to my
>>> explanation below of the strange behaviors (is it much more like a BUG
>> than
>>> anything else that is explainable):
>>> 
>>> Note that we still have 18GB of free unused memory on the server.
>>> 
>>> 1. We indexed the first collection called customers (3.7 millioin records
>>> from CSV data), index size is 2.09GB. The search in customers for any
>>> keyword is returned within 50ms (QTime) for using highlight (unified
>>> highlighter, posting, light term vectors)
>>> 
>>> 2. Then we indexed the second collection called policies (6 million
>> records
>>> from CSV data), index size is 2.55GB. The search in policies for any
>>> keyword is returned within 50ms (QTime) for using highlight (unified
>>> highlighter, posting, light term vectors)
>>> 
>>> 3. But now any search in customers for any keywords (not from cache)
>> takes
>>> as high as 1200ms (QTime). But still policies search remains very fast
>>> (50ms).
>>> 
>>> 4. So we decided to run the force optimize command on customers
>> collection (
>>> 
>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>> ),
>>> surprisingly after optimization the search on customers collection for
>> any
>>> keywords become very fast again (less than 50ms). BUT strangely, the
>> search
>>> in policies collection become very slow (around 1200ms) without any
>> changes
>>> to the policies collection.
>>> 
>>> 5. Based on above result, we decided to run the force optimize command on
>>> policies collection (
>>> 
>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>> ).
>>> More surprisingly, after optimization the search on policies collection
>> for
>>> any keywords become very fast again (less than 50ms). BUT more strangely,
>>> the search in customers collection again become very slow (around 1200ms)
>>> without any changes to the customers collection.
>>> 
>>> What a strange and unexpected behavior! If this is not a bug, how could
>> you
>>> explain the above very strange behavior in Solr 7.5. Could it be a bug?
>>> 
>>> We would appreciate any support or help on our above situation.
>>> 
>>> Thank you.
>>> 
>>> Regards,
>>> Edwin
>>> 
>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>>> wrote:
>>> 
>>>> Hi Shawn,
>>>> 
>>>>> If the two collections have data on the same server(s), I can see this
>>>>> happening.  More memory is consumed when there is additional data, and
>>>>> when Solr needs more memory, performance might be affected.  The
>>>>> solution is generally to install more memory in the server.
>>>> 
>>>> I have found that even after we delete the index in collection2, the
>> query
>>>> QTime for collection1 still remains slow. It does not goes back to its
>>>> previous fast speed before we index collection2.
>>>> 
>>>> Regards,
>>>> Edwin
>>>> 
>>>> 
>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <edwinye...@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> Hi Shawn,
>>>>> 
>>>>> Thanks for your reply.
>>>>> 
>>>>> The log only shows a list  the following and I don't see any other logs
>>>>> besides these.
>>>>> 
>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> update-script#processAdd:
>>>>> id=13245417
>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> update-script#processAdd:
>>>>> id=13245430
>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> update-script#processAdd:
>>>>> id=13245435
>>>>> 
>>>>> There is no change to the segments info. but the slowdown in the first
>>>>> collection is very drastic.
>>>>> Before the indexing of collection2, the collection1 query QTime are in
>>>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>>>> collection1 query QTime increases to more than 1000 ms. The index are
>> done
>>>>> in CSV format, and the size of the index is 3GB.
>>>>> 
>>>>> Regards,
>>>>> Edwin
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <apa...@elyograg.org>
>> wrote:
>>>>> 
>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I
>> am
>>>>>>> indexing in collection2, the indexing affects the records in
>>>>>> collection1.
>>>>>>> Although the records are still intact, it seems that the settings of
>>>>>> the
>>>>>>> termVecotrs get wipe out, and the index size of collection1 reduced
>>>>>> from
>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>>>>> 
>>>>>> This should not be possible.  Indexing in one collection should have
>>>>>> absolutely no effect on another collection.
>>>>>> 
>>>>>> If logging has been left at its default settings, the solr.log file
>>>>>> should have enough info to show what actually happened.
>>>>>> 
>>>>>>> Also, the search in
>>>>>>> collection1, which was originall very fast, becomes very slow after
>> the
>>>>>>> indexing is done is collection2.
>>>>>> 
>>>>>> If the two collections have data on the same server(s), I can see this
>>>>>> happening.  More memory is consumed when there is additional data, and
>>>>>> when Solr needs more memory, performance might be affected.  The
>>>>>> solution is generally to install more memory in the server.  If the
>>>>>> system is working, there should be no need to increase the heap size
>>>>>> when the memory size increases ... but there can be situations where
>> the
>>>>>> heap is a little bit too small, where you WOULD want to increase the
>>>>>> heap size.
>>>>>> 
>>>>>> Thanks,
>>>>>> Shawn
>>>>>> 
>>>>>> 
>> 
>>

Re: Indexing in one collection affect index in another collection

Reply via email to