Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo Thu, 24 Jan 2019 07:35:47 -0800

Hi Jan,

Thanks for your reply.


However, we are still getting a slow QTime of 517ms even after we set
hl=false&fl=null.

Below is the debug query:

  "debug":{
    "rawquerystring":"cherry",
    "querystring":"cherry",
    "parsedquery":"searchFields_tcs:cherry",
    "parsedquery_toString":"searchFields_tcs:cherry",
    "explain":{
      "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
5747763) [SchemaSimilarity], result of:\n  14.227914 =
score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 =
docCount\n    1.4798305 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter
b\n      19.397041 = avgFieldLength\n      25.0 = fieldLength\n",
      "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
4840794) [SchemaSimilarity], result of:\n  13.937909 =
score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 =
docCount\n    1.4496675 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter
b\n      19.397041 = avgFieldLength\n      27.0 = fieldLength\n",
    "QParser":"LuceneQParser",
    "timing":{
      "time":517.0,
      "prepare":{
        "time":0.0,
        "query":{
          "time":0.0},
        "facet":{
          "time":0.0},
        "facet_module":{
          "time":0.0},
        "mlt":{
          "time":0.0},
        "highlight":{
          "time":0.0},
        "stats":{
          "time":0.0},
        "expand":{
          "time":0.0},
        "terms":{
          "time":0.0},
        "debug":{
          "time":0.0}},
      "process":{
        "time":516.0,
        "query":{
          "time":15.0},
        "facet":{
          "time":0.0},
        "facet_module":{
          "time":0.0},
        "mlt":{
          "time":0.0},
        "highlight":{
          "time":0.0},
        "stats":{
          "time":0.0},
        "expand":{
          "time":0.0},
        "terms":{
          "time":0.0},
        "debug":{
          "time":500.0}}}}}

Regards,
Edwin


On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <jan....@cominvent.com> wrote:

> Looks like highlighting takes most of the time on the first query (680ms).
> You config seems to ask for a lot of highlighting here, like 100 snippets
> of max 100000 characters etc.
> Sounds to me that this might be a highlighting configuration problem. Try
> to disable highlighting (hl=false) and see if you get back your speed.
> Also, I see fl=* in your config, which is really asking for all fields.
> Are you sure you want that, that may also be slow. Try to ask for just the
> fields you will be using.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com
> >:
> >
> > Thanks for your reply.
> >
> > Below are what you have requested about our Solr setup, configurations
> > files ,schema and results of debug queries:
> >
> > Looking forward to your advice and support on our problem.
> >
> > 1. System configurations
> > OS: Windows 10 Pro 64 bit
> > System Memory: 32GB
> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> > Processor(s)
> > HDD: 3.0 TB (free 2.1 TB)  SATA
> >
> > 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> > which can be download from the following link:
> >
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> >
> > 3. The debug queries from both collections
> >
> > *3.1. Debug Query From Policies ( which is Slow)*
> >
> >  "debug":{
> >
> >    "rawquerystring":"sherry",
> >
> >    "querystring":"sherry",
> >
> >    "parsedquery":"searchFields_tcs:sherry",
> >
> >    "parsedquery_toString":"searchFields_tcs:sherry",
> >
> >    "explain":{
> >
> >      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> > 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> > (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
> > docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> > 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
> > b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
> >
> >    "QParser":"LuceneQParser",
> >
> >    "timing":{
> >
> >      "time":681.0,
> >
> >      "prepare":{
> >
> >        "time":0.0,
> >
> >        "query":{
> >
> >          "time":0.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":0.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":0.0}},
> >
> >      "process":{
> >
> >        "time":680.0,
> >
> >        "query":{
> >
> >          "time":19.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":651.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":8.0}},
> >
> >      "loadFieldValues":{
> >
> >        "time":12.0}}}}
> >
> >
> >
> > *3.2. Debug Query From Customers (which is fast because we index it after
> > indexing Policies):*
> >
> >
> >
> >  "debug":{
> >
> >    "rawquerystring":"sherry",
> >
> >    "querystring":"sherry",
> >
> >    "parsedquery":"searchFields_tcs:sherry",
> >
> >    "parsedquery_toString":"searchFields_tcs:sherry",
> >
> >    "explain":{
> >
> >      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> > 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> > score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> > 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
> > 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> > b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
> > termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> > 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
> >
> >    "QParser":"LuceneQParser",
> >
> >    "timing":{
> >
> >      "time":38.0,
> >
> >      "prepare":{
> >
> >        "time":1.0,
> >
> >        "query":{
> >
> >          "time":1.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":0.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":0.0}},
> >
> >      "process":{
> >
> >        "time":36.0,
> >
> >        "query":{
> >
> >          "time":1.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":31.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":3.0}},
> >
> >      "loadFieldValues":{
> >
> >        "time":13.0}}}}
> >
> >
> >
> > Best Regards,
> > Edwin
> >
> > On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <jan....@cominvent.com> wrote:
> >
> >> It would be useful if you can disclose the machine configuration, OS,
> >> memory, settings etc, as well as solr config including solr.in <
> >> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
> picture
> >> of memory, GC, etc.
> >> You could also specify debugQuery=true on a slow search and check the
> >> timings section for clues. What QTime are you seeing on the slow
> queries in
> >> solr.log?
> >> If that does not reveal the reason, I'd connect to your solr instance
> with
> >> a tool like jVisualVM or similar, to inspect what takes time. Or better,
> >> hook up to DataDog, SPM or some other cloud tool to get a full view of
> the
> >> system.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>> :
> >>>
> >>> Hi Shawn,
> >>>
> >>> Unfortunately your reply of memory may not be valid. Please refer to my
> >>> explanation below of the strange behaviors (is it much more like a BUG
> >> than
> >>> anything else that is explainable):
> >>>
> >>> Note that we still have 18GB of free unused memory on the server.
> >>>
> >>> 1. We indexed the first collection called customers (3.7 millioin
> records
> >>> from CSV data), index size is 2.09GB. The search in customers for any
> >>> keyword is returned within 50ms (QTime) for using highlight (unified
> >>> highlighter, posting, light term vectors)
> >>>
> >>> 2. Then we indexed the second collection called policies (6 million
> >> records
> >>> from CSV data), index size is 2.55GB. The search in policies for any
> >>> keyword is returned within 50ms (QTime) for using highlight (unified
> >>> highlighter, posting, light term vectors)
> >>>
> >>> 3. But now any search in customers for any keywords (not from cache)
> >> takes
> >>> as high as 1200ms (QTime). But still policies search remains very fast
> >>> (50ms).
> >>>
> >>> 4. So we decided to run the force optimize command on customers
> >> collection (
> >>>
> >>
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
> >> ),
> >>> surprisingly after optimization the search on customers collection for
> >> any
> >>> keywords become very fast again (less than 50ms). BUT strangely, the
> >> search
> >>> in policies collection become very slow (around 1200ms) without any
> >> changes
> >>> to the policies collection.
> >>>
> >>> 5. Based on above result, we decided to run the force optimize command
> on
> >>> policies collection (
> >>>
> >>
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
> >> ).
> >>> More surprisingly, after optimization the search on policies collection
> >> for
> >>> any keywords become very fast again (less than 50ms). BUT more
> strangely,
> >>> the search in customers collection again become very slow (around
> 1200ms)
> >>> without any changes to the customers collection.
> >>>
> >>> What a strange and unexpected behavior! If this is not a bug, how could
> >> you
> >>> explain the above very strange behavior in Solr 7.5. Could it be a bug?
> >>>
> >>> We would appreciate any support or help on our above situation.
> >>>
> >>> Thank you.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Shawn,
> >>>>
> >>>>> If the two collections have data on the same server(s), I can see
> this
> >>>>> happening.  More memory is consumed when there is additional data,
> and
> >>>>> when Solr needs more memory, performance might be affected.  The
> >>>>> solution is generally to install more memory in the server.
> >>>>
> >>>> I have found that even after we delete the index in collection2, the
> >> query
> >>>> QTime for collection1 still remains slow. It does not goes back to its
> >>>> previous fast speed before we index collection2.
> >>>>
> >>>> Regards,
> >>>> Edwin
> >>>>
> >>>>
> >>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> Hi Shawn,
> >>>>>
> >>>>> Thanks for your reply.
> >>>>>
> >>>>> The log only shows a list  the following and I don't see any other
> logs
> >>>>> besides these.
> >>>>>
> >>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >> update-script#processAdd:
> >>>>> id=13245417
> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >> update-script#processAdd:
> >>>>> id=13245430
> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >> update-script#processAdd:
> >>>>> id=13245435
> >>>>>
> >>>>> There is no change to the segments info. but the slowdown in the
> first
> >>>>> collection is very drastic.
> >>>>> Before the indexing of collection2, the collection1 query QTime are
> in
> >>>>> the range of 4 to 50 ms. However, after indexing collection2, the
> >>>>> collection1 query QTime increases to more than 1000 ms. The index are
> >> done
> >>>>> in CSV format, and the size of the index is 3GB.
> >>>>>
> >>>>> Regards,
> >>>>> Edwin
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <apa...@elyograg.org>
> >> wrote:
> >>>>>
> >>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> >>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I
> >> am
> >>>>>>> indexing in collection2, the indexing affects the records in
> >>>>>> collection1.
> >>>>>>> Although the records are still intact, it seems that the settings
> of
> >>>>>> the
> >>>>>>> termVecotrs get wipe out, and the index size of collection1 reduced
> >>>>>> from
> >>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
> >>>>>>
> >>>>>> This should not be possible.  Indexing in one collection should have
> >>>>>> absolutely no effect on another collection.
> >>>>>>
> >>>>>> If logging has been left at its default settings, the solr.log file
> >>>>>> should have enough info to show what actually happened.
> >>>>>>
> >>>>>>> Also, the search in
> >>>>>>> collection1, which was originall very fast, becomes very slow after
> >> the
> >>>>>>> indexing is done is collection2.
> >>>>>>
> >>>>>> If the two collections have data on the same server(s), I can see
> this
> >>>>>> happening.  More memory is consumed when there is additional data,
> and
> >>>>>> when Solr needs more memory, performance might be affected.  The
> >>>>>> solution is generally to install more memory in the server.  If the
> >>>>>> system is working, there should be no need to increase the heap size
> >>>>>> when the memory size increases ... but there can be situations where
> >> the
> >>>>>> heap is a little bit too small, where you WOULD want to increase the
> >>>>>> heap size.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Shawn
> >>>>>>
> >>>>>>
> >>
> >>
>
>

Re: Indexing in one collection affect index in another collection

Reply via email to