Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo Fri, 25 Jan 2019 06:48:10 -0800

Hi Jan,

Below is the command that we used to start Solr:


cd solr-7.5.0
bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z
"localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
pause


We also have a replica, and in this development setting, we put it in the
same PC to simulate it.
Below is the command to start the replica, which uses Port 8984.

cd solr-7.5.0
bin\solr.cmd start -cloud -p 8984 -s solrMain\node2 -m 6g -z
"localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
pause

Regards,
Edwin

On Fri, 25 Jan 2019 at 22:35, Jan Høydahl <jan....@cominvent.com> wrote:

> How do you start Solr, cause the solr.in.cmd you sent does not contain the
> memory settings. What other parameters do you start Solr with?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 25. jan. 2019 kl. 15:28 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com
> >:
> >
> > Hi Jan,
> >
> > We are using 64 bit Java, version 1.8.0_191.
> > We started Solr with 6 GB heap size.
> >
> > Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++
> running
> > on the machine. There is still 22 GB of memory left on the server, out of
> > the 32 GB available on the machine.
> >
> > Regards,
> > Edwin
> >
> > On Fri, 25 Jan 2019 at 21:04, Jan Høydahl <jan....@cominvent.com> wrote:
> >
> >> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
> >> size?
> >> Other software running on the machine?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>> :
> >>>
> >>> Hi Jan and Shawn,
> >>>
> >>> For your info, this is another debug query.
> >>>
> >>> "debug":{
> >>>
> >>>   "rawquerystring":"johnny",
> >>>
> >>>   "querystring":"johnny",
> >>>
> >>>   "parsedquery":"searchFields_tcs:johnny",
> >>>
> >>>   "parsedquery_toString":"searchFields_tcs:johnny",
> >>>
> >>>   "explain":{
> >>>
> >>>     "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
> >>> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
> >>> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n    7.5108404
> >>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> >>> 0.5)) from:\n      473.0 = docFreq\n      865438.0 = docCount\n
> >>> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> >>> b + b * fieldLength / avgFieldLength)) from:\n      4.0 =
> >>> termFreq=4.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> >>> 26.66791 = avgFieldLength\n      25.0 = fieldLength\n”,
> >>>
> >>>   "QParser":"LuceneQParser",
> >>>
> >>>   "timing":{
> >>>
> >>>     "time":350.0,
> >>>
> >>>     "prepare":{
> >>>
> >>>       "time":0.0,
> >>>
> >>>       "query":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "facet":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "facet_module":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "mlt":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "highlight":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "stats":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "expand":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "terms":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "debug":{
> >>>
> >>>         "time":0.0}},
> >>>
> >>>     "process":{
> >>>
> >>>       "time":348.0,
> >>>
> >>>       "query":{
> >>>
> >>>         "time":287.0},
> >>>
> >>>       "facet":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "facet_module":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "mlt":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "highlight":{
> >>>
> >>>         "time":54.0},
> >>>
> >>>       "stats":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "expand":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "terms":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "debug":{
> >>>
> >>>         "time":6.0}},
> >>>
> >>>     "loadFieldValues":{
> >>>
> >>>       "time":0.0}}}}
> >>>
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Jan and Shawn,
> >>>>
> >>>> Please focus on the strange issue that I have described above in more
> >>>> details, summary is as follows:
> >>>>
> >>>> 1. Index customers data, then queries from highlight, select, and all
> >>>> handlers are very fast (less than 50ms)
> >>>>
> >>>> 2. Now index policies data, then queries on polices are very fast BUT
> >>>> queries on customers becomes slow
> >>>>
> >>>> 3. If I reindex customers data, then again queries for customers are
> >> very
> >>>> fast BUT queries on policies becomes slow.
> >>>>
> >>>> How can you explain this behavior?
> >>>>
> >>>> We have never experienced such a strange behavior before Solr 7.
> >>>>
> >>>> Regards,
> >>>> Edwin
> >>>>
> >>>> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> Hi Jan,
> >>>>>
> >>>>> Referring to what you have mentioned that the highlighting takes up
> >> most
> >>>>> of the time in the first query from the policies collection, the
> >>>>> highlighting was very fast (less than 50ms) from the time it was
> >> indexed,
> >>>>> till the time after customers collection gets indexed, in which it
> >> slowed
> >>>>> down tremendously.
> >>>>>
> >>>>> Also, the slow down does not just affect on the highlight
> >> requestHandler.
> >>>>> It also affects other requestHandler like select and clustering as
> >> well.
> >>>>> All of them gets the QTime of more than 500ms. This is also proven in
> >> the
> >>>>> latest debug query that we have sent earlier, in which we have set
> >> hl=false
> >>>>> and fl=null.
> >>>>>
> >>>>> Any idea or explanation on this strange behavior?
> >>>>> Thank you for your support, as we look forward to shed some lights on
> >>>>> this issue and to resolve it.
> >>>>>
> >>>>> Regards,
> >>>>> Edwin
> >>>>>
> >>>>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Jan,
> >>>>>>
> >>>>>> Thanks for your reply.
> >>>>>>
> >>>>>> However, we are still getting a slow QTime of 517ms even after we
> set
> >>>>>> hl=false&fl=null.
> >>>>>>
> >>>>>> Below is the debug query:
> >>>>>>
> >>>>>> "debug":{
> >>>>>>   "rawquerystring":"cherry",
> >>>>>>   "querystring":"cherry",
> >>>>>>   "parsedquery":"searchFields_tcs:cherry",
> >>>>>>   "parsedquery_toString":"searchFields_tcs:cherry",
> >>>>>>   "explain":{
> >>>>>>     "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
> >> 5747763) [SchemaSimilarity], result of:\n  14.227914 =
> >> score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556
> =
> >> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
> >> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305
> =
> >> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
> >> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n
> 1.2 =
> >> parameter k1\n      0.75 = parameter b\n      19.397041 =
> avgFieldLength\n
> >>    25.0 = fieldLength\n",
> >>>>>>     "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
> >> 4840794) [SchemaSimilarity], result of:\n  13.937909 =
> >> score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556
> =
> >> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
> >> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675
> =
> >> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
> >> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n
> 1.2 =
> >> parameter k1\n      0.75 = parameter b\n      19.397041 =
> avgFieldLength\n
> >>    27.0 = fieldLength\n",
> >>>>>>   "QParser":"LuceneQParser",
> >>>>>>   "timing":{
> >>>>>>     "time":517.0,
> >>>>>>     "prepare":{
> >>>>>>       "time":0.0,
> >>>>>>       "query":{
> >>>>>>         "time":0.0},
> >>>>>>       "facet":{
> >>>>>>         "time":0.0},
> >>>>>>       "facet_module":{
> >>>>>>         "time":0.0},
> >>>>>>       "mlt":{
> >>>>>>         "time":0.0},
> >>>>>>       "highlight":{
> >>>>>>         "time":0.0},
> >>>>>>       "stats":{
> >>>>>>         "time":0.0},
> >>>>>>       "expand":{
> >>>>>>         "time":0.0},
> >>>>>>       "terms":{
> >>>>>>         "time":0.0},
> >>>>>>       "debug":{
> >>>>>>         "time":0.0}},
> >>>>>>     "process":{
> >>>>>>       "time":516.0,
> >>>>>>       "query":{
> >>>>>>         "time":15.0},
> >>>>>>       "facet":{
> >>>>>>         "time":0.0},
> >>>>>>       "facet_module":{
> >>>>>>         "time":0.0},
> >>>>>>       "mlt":{
> >>>>>>         "time":0.0},
> >>>>>>       "highlight":{
> >>>>>>         "time":0.0},
> >>>>>>       "stats":{
> >>>>>>         "time":0.0},
> >>>>>>       "expand":{
> >>>>>>         "time":0.0},
> >>>>>>       "terms":{
> >>>>>>         "time":0.0},
> >>>>>>       "debug":{
> >>>>>>         "time":500.0}}}}}
> >>>>>>
> >>>>>> Regards,
> >>>>>> Edwin
> >>>>>>
> >>>>>>
> >>>>>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <jan....@cominvent.com>
> >> wrote:
> >>>>>>
> >>>>>>> Looks like highlighting takes most of the time on the first query
> >>>>>>> (680ms). You config seems to ask for a lot of highlighting here,
> >> like 100
> >>>>>>> snippets of max 100000 characters etc.
> >>>>>>> Sounds to me that this might be a highlighting configuration
> problem.
> >>>>>>> Try to disable highlighting (hl=false) and see if you get back your
> >> speed.
> >>>>>>> Also, I see fl=* in your config, which is really asking for all
> >> fields.
> >>>>>>> Are you sure you want that, that may also be slow. Try to ask for
> >> just the
> >>>>>>> fields you will be using.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jan Høydahl, search solution architect
> >>>>>>> Cominvent AS - www.cominvent.com
> >>>>>>>
> >>>>>>>> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
> >>>>>>> edwinye...@gmail.com>:
> >>>>>>>>
> >>>>>>>> Thanks for your reply.
> >>>>>>>>
> >>>>>>>> Below are what you have requested about our Solr setup,
> >> configurations
> >>>>>>>> files ,schema and results of debug queries:
> >>>>>>>>
> >>>>>>>> Looking forward to your advice and support on our problem.
> >>>>>>>>
> >>>>>>>> 1. System configurations
> >>>>>>>> OS: Windows 10 Pro 64 bit
> >>>>>>>> System Memory: 32GB
> >>>>>>>> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8
> Logical
> >>>>>>>> Processor(s)
> >>>>>>>> HDD: 3.0 TB (free 2.1 TB)  SATA
> >>>>>>>>
> >>>>>>>> 2. solrconfig.xml of customers and policies collection, and
> solr.in
> >>>>>>> ,cmd
> >>>>>>>> which can be download from the following link:
> >>>>>>>>
> >>>>>>>
> >>
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> >>>>>>>>
> >>>>>>>> 3. The debug queries from both collections
> >>>>>>>>
> >>>>>>>> *3.1. Debug Query From Policies ( which is Slow)*
> >>>>>>>>
> >>>>>>>> "debug":{
> >>>>>>>>
> >>>>>>>>  "rawquerystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "querystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "explain":{
> >>>>>>>>
> >>>>>>>>    "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> >>>>>>>> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> >>>>>>>> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> >>>>>>>> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> >>>>>>>> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
> >>>>>>>> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> >>>>>>>> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> >>>>>>>> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 =
> parameter
> >>>>>>>> b\n      19.397041 = avgFieldLength\n      31.0 =
> fieldLength\n”,..
> >>>>>>>>
> >>>>>>>>  "QParser":"LuceneQParser",
> >>>>>>>>
> >>>>>>>>  "timing":{
> >>>>>>>>
> >>>>>>>>    "time":681.0,
> >>>>>>>>
> >>>>>>>>    "prepare":{
> >>>>>>>>
> >>>>>>>>      "time":0.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":0.0}},
> >>>>>>>>
> >>>>>>>>    "process":{
> >>>>>>>>
> >>>>>>>>      "time":680.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":19.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":651.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":8.0}},
> >>>>>>>>
> >>>>>>>>    "loadFieldValues":{
> >>>>>>>>
> >>>>>>>>      "time":12.0}}}}
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *3.2. Debug Query From Customers (which is fast because we index
> it
> >>>>>>> after
> >>>>>>>> indexing Policies):*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> "debug":{
> >>>>>>>>
> >>>>>>>>  "rawquerystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "querystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "explain":{
> >>>>>>>>
> >>>>>>>>    "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> >>>>>>>> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> >>>>>>>> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n
> >> 9.08604
> >>>>>>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> >>>>>>>> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
> >>>>>>>> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 *
> (1
> >> -
> >>>>>>>> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
> >>>>>>>> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> >>>>>>>> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
> >>>>>>>>
> >>>>>>>>  "QParser":"LuceneQParser",
> >>>>>>>>
> >>>>>>>>  "timing":{
> >>>>>>>>
> >>>>>>>>    "time":38.0,
> >>>>>>>>
> >>>>>>>>    "prepare":{
> >>>>>>>>
> >>>>>>>>      "time":1.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":1.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":0.0}},
> >>>>>>>>
> >>>>>>>>    "process":{
> >>>>>>>>
> >>>>>>>>      "time":36.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":1.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":31.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":3.0}},
> >>>>>>>>
> >>>>>>>>    "loadFieldValues":{
> >>>>>>>>
> >>>>>>>>      "time":13.0}}}}
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>> Edwin
> >>>>>>>>
> >>>>>>>> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <jan....@cominvent.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> It would be useful if you can disclose the machine configuration,
> >> OS,
> >>>>>>>>> memory, settings etc, as well as solr config including solr.in <
> >>>>>>>>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
> >>>>>>> picture
> >>>>>>>>> of memory, GC, etc.
> >>>>>>>>> You could also specify debugQuery=true on a slow search and check
> >> the
> >>>>>>>>> timings section for clues. What QTime are you seeing on the slow
> >>>>>>> queries in
> >>>>>>>>> solr.log?
> >>>>>>>>> If that does not reveal the reason, I'd connect to your solr
> >>>>>>> instance with
> >>>>>>>>> a tool like jVisualVM or similar, to inspect what takes time. Or
> >>>>>>> better,
> >>>>>>>>> hook up to DataDog, SPM or some other cloud tool to get a full
> view
> >>>>>>> of the
> >>>>>>>>> system.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Jan Høydahl, search solution architect
> >>>>>>>>> Cominvent AS - www.cominvent.com
> >>>>>>>>>
> >>>>>>>>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
> >>>>>>> edwinye...@gmail.com
> >>>>>>>>>> :
> >>>>>>>>>>
> >>>>>>>>>> Hi Shawn,
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately your reply of memory may not be valid. Please
> refer
> >>>>>>> to my
> >>>>>>>>>> explanation below of the strange behaviors (is it much more
> like a
> >>>>>>> BUG
> >>>>>>>>> than
> >>>>>>>>>> anything else that is explainable):
> >>>>>>>>>>
> >>>>>>>>>> Note that we still have 18GB of free unused memory on the
> server.
> >>>>>>>>>>
> >>>>>>>>>> 1. We indexed the first collection called customers (3.7
> millioin
> >>>>>>> records
> >>>>>>>>>> from CSV data), index size is 2.09GB. The search in customers
> for
> >>>>>>> any
> >>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
> >> (unified
> >>>>>>>>>> highlighter, posting, light term vectors)
> >>>>>>>>>>
> >>>>>>>>>> 2. Then we indexed the second collection called policies (6
> >> million
> >>>>>>>>> records
> >>>>>>>>>> from CSV data), index size is 2.55GB. The search in policies for
> >> any
> >>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
> >> (unified
> >>>>>>>>>> highlighter, posting, light term vectors)
> >>>>>>>>>>
> >>>>>>>>>> 3. But now any search in customers for any keywords (not from
> >> cache)
> >>>>>>>>> takes
> >>>>>>>>>> as high as 1200ms (QTime). But still policies search remains
> very
> >>>>>>> fast
> >>>>>>>>>> (50ms).
> >>>>>>>>>>
> >>>>>>>>>> 4. So we decided to run the force optimize command on customers
> >>>>>>>>> collection (
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
> >>>>>>>>> ),
> >>>>>>>>>> surprisingly after optimization the search on customers
> collection
> >>>>>>> for
> >>>>>>>>> any
> >>>>>>>>>> keywords become very fast again (less than 50ms). BUT strangely,
> >> the
> >>>>>>>>> search
> >>>>>>>>>> in policies collection become very slow (around 1200ms) without
> >> any
> >>>>>>>>> changes
> >>>>>>>>>> to the policies collection.
> >>>>>>>>>>
> >>>>>>>>>> 5. Based on above result, we decided to run the force optimize
> >>>>>>> command on
> >>>>>>>>>> policies collection (
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
> >>>>>>>>> ).
> >>>>>>>>>> More surprisingly, after optimization the search on policies
> >>>>>>> collection
> >>>>>>>>> for
> >>>>>>>>>> any keywords become very fast again (less than 50ms). BUT more
> >>>>>>> strangely,
> >>>>>>>>>> the search in customers collection again become very slow
> (around
> >>>>>>> 1200ms)
> >>>>>>>>>> without any changes to the customers collection.
> >>>>>>>>>>
> >>>>>>>>>> What a strange and unexpected behavior! If this is not a bug,
> how
> >>>>>>> could
> >>>>>>>>> you
> >>>>>>>>>> explain the above very strange behavior in Solr 7.5. Could it
> be a
> >>>>>>> bug?
> >>>>>>>>>>
> >>>>>>>>>> We would appreciate any support or help on our above situation.
> >>>>>>>>>>
> >>>>>>>>>> Thank you.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Edwin
> >>>>>>>>>>
> >>>>>>>>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
> >>>>>>> edwinye...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Shawn,
> >>>>>>>>>>>
> >>>>>>>>>>>> If the two collections have data on the same server(s), I can
> >> see
> >>>>>>> this
> >>>>>>>>>>>> happening.  More memory is consumed when there is additional
> >>>>>>> data, and
> >>>>>>>>>>>> when Solr needs more memory, performance might be affected.
> The
> >>>>>>>>>>>> solution is generally to install more memory in the server.
> >>>>>>>>>>>
> >>>>>>>>>>> I have found that even after we delete the index in
> collection2,
> >>>>>>> the
> >>>>>>>>> query
> >>>>>>>>>>> QTime for collection1 still remains slow. It does not goes back
> >> to
> >>>>>>> its
> >>>>>>>>>>> previous fast speed before we index collection2.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Edwin
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
> >>>>>>> edwinye...@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Shawn,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for your reply.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The log only shows a list  the following and I don't see any
> >>>>>>> other logs
> >>>>>>>>>>>> besides these.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330)
> >> [c:collectioin1
> >>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>>>> update-script#processAdd:
> >>>>>>>>>>>> id=13245417
> >>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
> >> [c:collectioin1
> >>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>>>> update-script#processAdd:
> >>>>>>>>>>>> id=13245430
> >>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
> >> [c:collectioin1
> >>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>>>> update-script#processAdd:
> >>>>>>>>>>>> id=13245435
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is no change to the segments info. but the slowdown in
> the
> >>>>>>> first
> >>>>>>>>>>>> collection is very drastic.
> >>>>>>>>>>>> Before the indexing of collection2, the collection1 query
> QTime
> >>>>>>> are in
> >>>>>>>>>>>> the range of 4 to 50 ms. However, after indexing collection2,
> >> the
> >>>>>>>>>>>> collection1 query QTime increases to more than 1000 ms. The
> >> index
> >>>>>>> are
> >>>>>>>>> done
> >>>>>>>>>>>> in CSV format, and the size of the index is 3GB.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Edwin
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <
> apa...@elyograg.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> >>>>>>>>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
> >>>>>>> when I
> >>>>>>>>> am
> >>>>>>>>>>>>>> indexing in collection2, the indexing affects the records in
> >>>>>>>>>>>>> collection1.
> >>>>>>>>>>>>>> Although the records are still intact, it seems that the
> >>>>>>> settings of
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> termVecotrs get wipe out, and the index size of collection1
> >>>>>>> reduced
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This should not be possible.  Indexing in one collection
> should
> >>>>>>> have
> >>>>>>>>>>>>> absolutely no effect on another collection.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If logging has been left at its default settings, the
> solr.log
> >>>>>>> file
> >>>>>>>>>>>>> should have enough info to show what actually happened.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Also, the search in
> >>>>>>>>>>>>>> collection1, which was originall very fast, becomes very
> slow
> >>>>>>> after
> >>>>>>>>> the
> >>>>>>>>>>>>>> indexing is done is collection2.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If the two collections have data on the same server(s), I can
> >>>>>>> see this
> >>>>>>>>>>>>> happening.  More memory is consumed when there is additional
> >>>>>>> data, and
> >>>>>>>>>>>>> when Solr needs more memory, performance might be affected.
> >> The
> >>>>>>>>>>>>> solution is generally to install more memory in the server.
> If
> >>>>>>> the
> >>>>>>>>>>>>> system is working, there should be no need to increase the
> heap
> >>>>>>> size
> >>>>>>>>>>>>> when the memory size increases ... but there can be
> situations
> >>>>>>> where
> >>>>>>>>> the
> >>>>>>>>>>>>> heap is a little bit too small, where you WOULD want to
> >> increase
> >>>>>>> the
> >>>>>>>>>>>>> heap size.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Shawn
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>
> >>
>
>

Re: Indexing in one collection affect index in another collection

Reply via email to