JSON Query DSL Param Dropping/Overwriting

2018-10-28 Thread Jason Gerlowski
Hi all,

Had a question about how parameters are combined/overlaid in the JSON
Query DSL.  Ran into some behavior that struck me as odd/maybe-buggy.

The query DSL allows params to be provided a few different ways:
1. As query-params in the URI (e.g. "/select?fq=inStock:true")
2. In the JSON request proper (e.g. "json={'filter': 'inStock:true'}"
3. In a special "params" block in the JSON (e.g. "json={ params:
{'fq': 'inStock:true'}}")

When the same parameter (e.g. fq/filter) is provided in more than one
syntax, Solr generally respects each of them, adding all filters to
the request.  But when a filter is present in the JSON "params" block,
it gets ignored or overwritten. (This is reflected in the results, but
it's also easy to tell from the "parsed_filter_queries" block when
requests are run with debug=true)  Does anyone know if this is
expected or a mistake?  It seems like a bug, but maybe I'm just not
seeing the use-case.

I've got a more detailed script showing the difference here for anyone
interested: https://pastebin.com/u1pdMvrq

Best,

Jason


RE: Tesseract language

2018-10-28 Thread Martin Frank Hansen (MHQ)
Hi Tim and Rohan,

Really appreciate your help, and I finally made it work (without tess4j).

It was the path-environment variable which had a wrong setting. Instead setting 
the path of TESSDATA_PREFIX to  'Tesseract-OCR/tessdata' I changed it to the 
parent folder 'Tesseract-OCR' and now it works for Danish.

Thanks again for helping.

Best regards

Martin

-Original Message-
From: Tim Allison 
Sent: 27. oktober 2018 14:37
To: solr-user@lucene.apache.org; u...@tika.apache.org
Subject: Re: Tesseract language

Martin,
  Let’s move this over to user@tika.

Rohan,
  Is there something about Tika’s use of tesseract for image files that can be 
improved?

Best,
   Tim

On Sat, Oct 27, 2018 at 3:40 AM Rohan Kasat  wrote:

> I used tess4j for image formats and Tika for scanned PDFs and images
> within PDFs.
>
> Regards,
> Rohan Kasat
>
> On Sat, Oct 27, 2018 at 12:39 AM Martin Frank Hansen (MHQ)
> 
> wrote:
>
> > Hi Rohan,
> >
> > Thanks for your reply, are you using tess4j with Tika or on its own?
> > I will take a look at tess4j if I can't make it work with Tika alone.
> >
> > Best regards
> > Martin
> >
> >
> > -Original Message-
> > From: Rohan Kasat 
> > Sent: 26. oktober 2018 21:45
> > To: solr-user@lucene.apache.org
> > Subject: Re: Tesseract language
> >
> > Hi Martin,
> >
> > Are you using it For image formats , I think you can try tess4j and
> > use give TESSDATA_PREFIX as the home for tessarct Configs.
> >
> > I have tried it and it works pretty well in my local machine.
> >
> > I have used java 8 and tesseact 3 for the same.
> >
> > Regards,
> > Rohan Kasat
> >
> > On Fri, Oct 26, 2018 at 12:31 PM Martin Frank Hansen (MHQ)
> > 
> > wrote:
> >
> > > Hi Tim,
> > >
> > > You were right.
> > >
> > > When I called `tesseract testing/eurotext.png testing/eurotext-dan
> > > -l dan`, I got an error message so I downloaded "dan.traineddata"
> > > and added it to the Tesseract-OCR/tessdata folder. Furthermore I
> > > added the 'TESSDATA_PREFIX' variable to the path-variables
> > > pointing to "Tesseract-OCR/tessdata".
> > >
> > > Now Tesseract works with Danish language from the CMD, but now I
> > > can't make the code work in Java, not even with default settings
> > > (which I could before). Am I missing something or just mixing some things 
> > > up?
> > >
> > >
> > >
> > > -Original Message-
> > > From: Tim Allison 
> > > Sent: 26. oktober 2018 19:58
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Tesseract language
> > >
> > > Tika relies on you to install tesseract and all the language
> > > libraries you'll need.
> > >
> > > If you can successfully call `tesseract testing/eurotext.png
> > > testing/eurotext-dan -l dan`, Tika _should_ be able to specify "dan"
> > > with your code above.
> > > On Fri, Oct 26, 2018 at 10:49 AM Martin Frank Hansen (MHQ)
> > > 
> > > wrote:
> > > >
> > > > Hi again,
> > > >
> > > > Now I moved the OCR part to Tika, but I still can't make it work
> > > > with
> > > Danish. It works when using default language settings and it seems
> > > like Tika is missing Danish dictionary.
> > > >
> > > > My java code looks like this:
> > > >
> > > > {
> > > > File file = new File(pathfilename);
> > > >
> > > > Metadata meta = new Metadata();
> > > >
> > > > InputStream stream = TikaInputStream.get(file);
> > > >
> > > > Parser parser = new AutoDetectParser();
> > > > BodyContentHandler handler = new
> > > > BodyContentHandler(Integer.MAX_VALUE);
> > > >
> > > > TesseractOCRConfig config = new TesseractOCRConfig();
> > > > config.setLanguage("dan"); // code works if this
> > > > phrase is
> > > commented out.
> > > >
> > > > ParseContext parseContext = new ParseContext();
> > > >
> > > >  parseContext.set(TesseractOCRConfig.class, config);
> > > >
> > > > parser.parse(stream, handler, meta, parseContext);
> > > > System.out.println(handler.toString());
> > > > }
> > > >
> > > > Hope that someone can help here.
> > > >
> > > > -Original Message-
> > > > From: Martin Frank Hansen (MHQ) 
> > > > Sent: 22. oktober 2018 07:58
> > 
> > > > To: solr-user@lucene.apache.org
> > > > Subject: SV: Tessera
> > > 
> > > ct
> > > language
> > > >
> > > > Hi Erick,
> > > >
> > > > Thanks for the help! I will take a look at it.
> > > >
> > > >
> > > > Martin Frank Hansen, Senior Data Analytiker
> > > >
> > > > Data, IM & Analytics
> > > >
> > > >
> > > >
> > > > Lautrupparken 40-42, DK-2750 Ballerup E-mail m...@kmd.dk  Web
> > > > www.kmd.dk Mobil +4525571418
> > > >
> > > > -Oprindelig meddelelse-
> > > > Fra: Erick Erickson 
> > > > Sendt: 21. oktober 2018 22:49
> > > > Til: solr-user 
> > > > Emne: Re: Tesseract language
> > > >
> > > > Here's a skeletal program that uses Tika in a stand-a

Re: searching is slow while adding document each time

2018-10-28 Thread Parag Shah
Hi Mugeesh,

Have you tried optimizing indexes to see if performance improves? It is
well known that over time as indexing goes on lucene creates more segments
which will be  searched over and hence take longer. Merging happens
constantly but continuous indexing will still introduce smaller segments
all the time. Have your tried running "optimize" periodically. Is it
something that you can afford to run? If you have a Master-Slave setup for
Indexer v/s searchers, you can replicate on optimize in the Master, thereby
removing the optimize load on the searchers, but replicate to the searcher
periodically. That might help with reducing latency. Optimize merges
segments and hence creates a more compact index that is faster to search.
It may involve some higher latency temporarily right after the replication,
but will go away soon after in-memory caches are full.

What is the search count/sec you are seeing?

Regards
Parag

On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain  wrote:

> Hi,
>
> We are running 3 node solr cloud(4.4) in our production infrastructure, We
> recently moved our SOLR server host softlayer to digital ocean server with
> same configuration as production.
>
> Now we are facing some slowness in the searcher when we index document,
> when
> we stop indexing then searches is fine, while adding document then it
> become
> slow. one of solr server we are indexing other 2 for searching the request.
>
>
> I am just wondering what was the reason searches become slow while indexing
> even we are using same configuration as we had in prod?
>
> at the time we are pushing 500 document at a time, this processing is
> continuously running(adding & deleting)
>
> these are the indexing logs
>
> 65497339 [http-apr-8980-exec-45] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> D1E52788A466E484 (1612655281636900864)]} 0 9
> 65497459 [http-apr-8980-exec-22] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> 50EF977E5E873065 (1612655281759584256)]} 0 9
> 65497572 [http-apr-8980-exec-40] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> 59C4A764BB50B13B (1612655281880170496)]} 0 9
> 65497724 [http-apr-8980-exec-31] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
Do not run optimize (force merge) unless you really understand the downside.

If you are continually adding and deleting documents, you really do not want
to run optimize.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
> 
> Hi Mugeesh,
> 
>Have you tried optimizing indexes to see if performance improves? It is
> well known that over time as indexing goes on lucene creates more segments
> which will be  searched over and hence take longer. Merging happens
> constantly but continuous indexing will still introduce smaller segments
> all the time. Have your tried running "optimize" periodically. Is it
> something that you can afford to run? If you have a Master-Slave setup for
> Indexer v/s searchers, you can replicate on optimize in the Master, thereby
> removing the optimize load on the searchers, but replicate to the searcher
> periodically. That might help with reducing latency. Optimize merges
> segments and hence creates a more compact index that is faster to search.
> It may involve some higher latency temporarily right after the replication,
> but will go away soon after in-memory caches are full.
> 
>What is the search count/sec you are seeing?
> 
> Regards
> Parag
> 
> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain  wrote:
> 
>> Hi,
>> 
>> We are running 3 node solr cloud(4.4) in our production infrastructure, We
>> recently moved our SOLR server host softlayer to digital ocean server with
>> same configuration as production.
>> 
>> Now we are facing some slowness in the searcher when we index document,
>> when
>> we stop indexing then searches is fine, while adding document then it
>> become
>> slow. one of solr server we are indexing other 2 for searching the request.
>> 
>> 
>> I am just wondering what was the reason searches become slow while indexing
>> even we are using same configuration as we had in prod?
>> 
>> at the time we are pushing 500 document at a time, this processing is
>> continuously running(adding & deleting)
>> 
>> these are the indexing logs
>> 
>> 65497339 [http-apr-8980-exec-45] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
>> }
>> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
>> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
>> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
>> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
>> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
>> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
>> D1E52788A466E484 (1612655281636900864)]} 0 9
>> 65497459 [http-apr-8980-exec-22] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
>> }
>> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
>> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
>> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
>> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
>> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
>> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
>> 50EF977E5E873065 (1612655281759584256)]} 0 9
>> 65497572 [http-apr-8980-exec-40] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
>> }
>> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
>> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
>> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
>> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
>> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
>> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
>> 59C4A764BB50B13B (1612655281880170496)]} 0 9
>> 65497724 [http-apr-8980-exec-31] INFO
>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
>> path=/update
>> params={distrib.from=
>> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
>> }
>> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
>> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
>> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
>> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
>> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
>> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
>> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 

Re: searching is slow while adding document each time

2018-10-28 Thread Parag Shah
What would you do if your performance is degrading?

I am not suggesting doing this for a serving index. Only one at the Master,
which ones optimized gets replicated. Am I missing something here?

On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood 
wrote:

> Do not run optimize (force merge) unless you really understand the
> downside.
>
> If you are continually adding and deleting documents, you really do not
> want
> to run optimize.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
> >
> > Hi Mugeesh,
> >
> >Have you tried optimizing indexes to see if performance improves? It
> is
> > well known that over time as indexing goes on lucene creates more
> segments
> > which will be  searched over and hence take longer. Merging happens
> > constantly but continuous indexing will still introduce smaller segments
> > all the time. Have your tried running "optimize" periodically. Is it
> > something that you can afford to run? If you have a Master-Slave setup
> for
> > Indexer v/s searchers, you can replicate on optimize in the Master,
> thereby
> > removing the optimize load on the searchers, but replicate to the
> searcher
> > periodically. That might help with reducing latency. Optimize merges
> > segments and hence creates a more compact index that is faster to search.
> > It may involve some higher latency temporarily right after the
> replication,
> > but will go away soon after in-memory caches are full.
> >
> >What is the search count/sec you are seeing?
> >
> > Regards
> > Parag
> >
> > On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> wrote:
> >
> >> Hi,
> >>
> >> We are running 3 node solr cloud(4.4) in our production infrastructure,
> We
> >> recently moved our SOLR server host softlayer to digital ocean server
> with
> >> same configuration as production.
> >>
> >> Now we are facing some slowness in the searcher when we index document,
> >> when
> >> we stop indexing then searches is fine, while adding document then it
> >> become
> >> slow. one of solr server we are indexing other 2 for searching the
> request.
> >>
> >>
> >> I am just wondering what was the reason searches become slow while
> indexing
> >> even we are using same configuration as we had in prod?
> >>
> >> at the time we are pushing 500 document at a time, this processing is
> >> continuously running(adding & deleting)
> >>
> >> these are the indexing logs
> >>
> >> 65497339 [http-apr-8980-exec-45] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
> >> path=/update
> >> params={distrib.from=
> >>
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> >> }
> >> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> >> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> >> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> >> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> >> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> >> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> >> D1E52788A466E484 (1612655281636900864)]} 0 9
> >> 65497459 [http-apr-8980-exec-22] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
> >> path=/update
> >> params={distrib.from=
> >>
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> >> }
> >> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> >> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> >> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> >> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> >> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> >> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> >> 50EF977E5E873065 (1612655281759584256)]} 0 9
> >> 65497572 [http-apr-8980-exec-40] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
> >> path=/update
> >> params={distrib.from=
> >>
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> >> }
> >> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> >> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> >> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> >> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> >> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> >> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> >> 59C4A764BB50B13B (1612655281880170496)]} 0 9
> >> 65497724 [http-apr-8980-exec-31] INFO
> >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
> >> path=/update
> >> params={distrib.from=
> >>
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> >> }
> >> {add=[1F694F99367D7CE1 (1612655281895899136)

Re: searching is slow while adding document each time

2018-10-28 Thread Erick Erickson
Well, if you optimize on the master you'll inevitably copy the entire
index to each of the slaves. Consuming that much network bandwidth can
be A Bad Thing.

Here's the background for Walter's comment:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Solr 7.5 is much better about this:
https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/

Even with the improvements in Solr 7.5, optimize is still a very
expensive operation and unless you've measured and can _prove_ it's
beneficial enough to be worth the cost you should avoid it.

Best,
Erick
On Sun, Oct 28, 2018 at 1:51 PM Parag Shah  wrote:
>
> What would you do if your performance is degrading?
>
> I am not suggesting doing this for a serving index. Only one at the Master,
> which ones optimized gets replicated. Am I missing something here?
>
> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood 
> wrote:
>
> > Do not run optimize (force merge) unless you really understand the
> > downside.
> >
> > If you are continually adding and deleting documents, you really do not
> > want
> > to run optimize.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
> > >
> > > Hi Mugeesh,
> > >
> > >Have you tried optimizing indexes to see if performance improves? It
> > is
> > > well known that over time as indexing goes on lucene creates more
> > segments
> > > which will be  searched over and hence take longer. Merging happens
> > > constantly but continuous indexing will still introduce smaller segments
> > > all the time. Have your tried running "optimize" periodically. Is it
> > > something that you can afford to run? If you have a Master-Slave setup
> > for
> > > Indexer v/s searchers, you can replicate on optimize in the Master,
> > thereby
> > > removing the optimize load on the searchers, but replicate to the
> > searcher
> > > periodically. That might help with reducing latency. Optimize merges
> > > segments and hence creates a more compact index that is faster to search.
> > > It may involve some higher latency temporarily right after the
> > replication,
> > > but will go away soon after in-memory caches are full.
> > >
> > >What is the search count/sec you are seeing?
> > >
> > > Regards
> > > Parag
> > >
> > > On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> We are running 3 node solr cloud(4.4) in our production infrastructure,
> > We
> > >> recently moved our SOLR server host softlayer to digital ocean server
> > with
> > >> same configuration as production.
> > >>
> > >> Now we are facing some slowness in the searcher when we index document,
> > >> when
> > >> we stop indexing then searches is fine, while adding document then it
> > >> become
> > >> slow. one of solr server we are indexing other 2 for searching the
> > request.
> > >>
> > >>
> > >> I am just wondering what was the reason searches become slow while
> > indexing
> > >> even we are using same configuration as we had in prod?
> > >>
> > >> at the time we are pushing 500 document at a time, this processing is
> > >> continuously running(adding & deleting)
> > >>
> > >> these are the indexing logs
> > >>
> > >> 65497339 [http-apr-8980-exec-45] INFO
> > >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> > webapp=/solr
> > >> path=/update
> > >> params={distrib.from=
> > >>
> > http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> > >> }
> > >> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> > >> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> > >> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> > >> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> > >> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> > >> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> > >> D1E52788A466E484 (1612655281636900864)]} 0 9
> > >> 65497459 [http-apr-8980-exec-22] INFO
> > >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> > webapp=/solr
> > >> path=/update
> > >> params={distrib.from=
> > >>
> > http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> > >> }
> > >> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> > >> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> > >> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> > >> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> > >> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> > >> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> > >> 50EF977E5E873065 (1612655281759584256)]} 0 9
> > >> 65497572 [http-apr-8980-exec-40] INFO
> > >> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> > webapp=/solr
> > >> path=/update
> > >> params={distrib.from=
> > >>
> >

Re: Solr IndexSearcher lifecycle

2018-10-28 Thread Edward Ribeiro
On Fri, Oct 26, 2018 at 10:38 AM Xiaolong Zheng <
xiaolong.zh...@mathworks.com> wrote:

Hi,

But when things come to Solr world which in a Java Webapp with servlet
dispatcher. Do we also keep reusing the same IndexSearcher instance as long
as there is no index changing?


Yes. The IndexSearcher is in charge of processing all the requests. When
there's a (hard) commit, a new IndexSearcher is created. While the new
searcher is being initialised, the old one continues to process requests
normally. After the new IndexSearcher is initialised, the old one is
deregistered, continues to process the remaining requests sent to it and is
finally shutdown.The new IndexSearcher process the new requests.

Edward


Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
The original question is for a three-node Solr Cloud cluster with continuous 
updates.
Optimize in this configuration won’t help, it will just cause expensive merges 
later.

I would recommend updating from Solr 4.4. that is a very early release for
Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early releases, 
the
replicas actually did more indexing work than the leader.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 28, 2018, at 2:13 PM, Erick Erickson  wrote:
> 
> Well, if you optimize on the master you'll inevitably copy the entire
> index to each of the slaves. Consuming that much network bandwidth can
> be A Bad Thing.
> 
> Here's the background for Walter's comment:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> 
> Solr 7.5 is much better about this:
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> 
> Even with the improvements in Solr 7.5, optimize is still a very
> expensive operation and unless you've measured and can _prove_ it's
> beneficial enough to be worth the cost you should avoid it.
> 
> Best,
> Erick
> On Sun, Oct 28, 2018 at 1:51 PM Parag Shah  wrote:
>> 
>> What would you do if your performance is degrading?
>> 
>> I am not suggesting doing this for a serving index. Only one at the Master,
>> which ones optimized gets replicated. Am I missing something here?
>> 
>> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood 
>> wrote:
>> 
>>> Do not run optimize (force merge) unless you really understand the
>>> downside.
>>> 
>>> If you are continually adding and deleting documents, you really do not
>>> want
>>> to run optimize.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Oct 28, 2018, at 9:24 AM, Parag Shah  wrote:
 
 Hi Mugeesh,
 
   Have you tried optimizing indexes to see if performance improves? It
>>> is
 well known that over time as indexing goes on lucene creates more
>>> segments
 which will be  searched over and hence take longer. Merging happens
 constantly but continuous indexing will still introduce smaller segments
 all the time. Have your tried running "optimize" periodically. Is it
 something that you can afford to run? If you have a Master-Slave setup
>>> for
 Indexer v/s searchers, you can replicate on optimize in the Master,
>>> thereby
 removing the optimize load on the searchers, but replicate to the
>>> searcher
 periodically. That might help with reducing latency. Optimize merges
 segments and hence creates a more compact index that is faster to search.
 It may involve some higher latency temporarily right after the
>>> replication,
 but will go away soon after in-memory caches are full.
 
   What is the search count/sec you are seeing?
 
 Regards
 Parag
 
 On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
>>> wrote:
 
> Hi,
> 
> We are running 3 node solr cloud(4.4) in our production infrastructure,
>>> We
> recently moved our SOLR server host softlayer to digital ocean server
>>> with
> same configuration as production.
> 
> Now we are facing some slowness in the searcher when we index document,
> when
> we stop indexing then searches is fine, while adding document then it
> become
> slow. one of solr server we are indexing other 2 for searching the
>>> request.
> 
> 
> I am just wondering what was the reason searches become slow while
>>> indexing
> even we are using same configuration as we had in prod?
> 
> at the time we are pushing 500 document at a time, this processing is
> continuously running(adding & deleting)
> 
> these are the indexing logs
> 
> 65497339 [http-apr-8980-exec-45] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
>>> webapp=/solr
> path=/update
> params={distrib.from=
> 
>>> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> D1E52788A466E484 (1612655281636900864)]} 0 9
> 65497459 [http-apr-8980-exec-22] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
>>> webapp=/solr
> path=/update
> params={distrib.from=
> 
>>> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> (1612655281666260992), DFDCF1F8325A3

Re: searching is slow while adding document each time

2018-10-28 Thread Parag Shah
The original question though is about performance issue in the Searcher.
How would you improve that?

On Sun, Oct 28, 2018 at 4:37 PM Walter Underwood 
wrote:

> The original question is for a three-node Solr Cloud cluster with
> continuous updates.
> Optimize in this configuration won’t help, it will just cause expensive
> merges later.
>
> I would recommend updating from Solr 4.4. that is a very early release for
> Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early
> releases, the
> replicas actually did more indexing work than the leader.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 28, 2018, at 2:13 PM, Erick Erickson 
> wrote:
> >
> > Well, if you optimize on the master you'll inevitably copy the entire
> > index to each of the slaves. Consuming that much network bandwidth can
> > be A Bad Thing.
> >
> > Here's the background for Walter's comment:
> >
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> >
> > Solr 7.5 is much better about this:
> >
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >
> > Even with the improvements in Solr 7.5, optimize is still a very
> > expensive operation and unless you've measured and can _prove_ it's
> > beneficial enough to be worth the cost you should avoid it.
> >
> > Best,
> > Erick
> > On Sun, Oct 28, 2018 at 1:51 PM Parag Shah 
> wrote:
> >>
> >> What would you do if your performance is degrading?
> >>
> >> I am not suggesting doing this for a serving index. Only one at the
> Master,
> >> which ones optimized gets replicated. Am I missing something here?
> >>
> >> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>
> >>> Do not run optimize (force merge) unless you really understand the
> >>> downside.
> >>>
> >>> If you are continually adding and deleting documents, you really do not
> >>> want
> >>> to run optimize.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
>  On Oct 28, 2018, at 9:24 AM, Parag Shah 
> wrote:
> 
>  Hi Mugeesh,
> 
>    Have you tried optimizing indexes to see if performance improves? It
> >>> is
>  well known that over time as indexing goes on lucene creates more
> >>> segments
>  which will be  searched over and hence take longer. Merging happens
>  constantly but continuous indexing will still introduce smaller
> segments
>  all the time. Have your tried running "optimize" periodically. Is it
>  something that you can afford to run? If you have a Master-Slave setup
> >>> for
>  Indexer v/s searchers, you can replicate on optimize in the Master,
> >>> thereby
>  removing the optimize load on the searchers, but replicate to the
> >>> searcher
>  periodically. That might help with reducing latency. Optimize merges
>  segments and hence creates a more compact index that is faster to
> search.
>  It may involve some higher latency temporarily right after the
> >>> replication,
>  but will go away soon after in-memory caches are full.
> 
>    What is the search count/sec you are seeing?
> 
>  Regards
>  Parag
> 
>  On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> >>> wrote:
> 
> > Hi,
> >
> > We are running 3 node solr cloud(4.4) in our production
> infrastructure,
> >>> We
> > recently moved our SOLR server host softlayer to digital ocean server
> >>> with
> > same configuration as production.
> >
> > Now we are facing some slowness in the searcher when we index
> document,
> > when
> > we stop indexing then searches is fine, while adding document then it
> > become
> > slow. one of solr server we are indexing other 2 for searching the
> >>> request.
> >
> >
> > I am just wondering what was the reason searches become slow while
> >>> indexing
> > even we are using same configuration as we had in prod?
> >
> > at the time we are pushing 500 document at a time, this processing is
> > continuously running(adding & deleting)
> >
> > these are the indexing logs
> >
> > 65497339 [http-apr-8980-exec-45] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> >>> webapp=/solr
> > path=/update
> > params={distrib.from=
> >
> >>>
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> > }
> > {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> > (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> > B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> > (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> > DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> > (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> > D1E52788A466E484 (16

Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
Upgrade, so that indexing isn’t using as much CPU. That leaves more CPU for 
search.

Make sure you are on a recent release of Java. Run the G1 collector.

If you need more throughput, add more replicas or use instance with more CPUs.

Has the index gotten bigger since the move?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 28, 2018, at 8:21 PM, Parag Shah  wrote:
> 
> The original question though is about performance issue in the Searcher.
> How would you improve that?
> 
> On Sun, Oct 28, 2018 at 4:37 PM Walter Underwood 
> wrote:
> 
>> The original question is for a three-node Solr Cloud cluster with
>> continuous updates.
>> Optimize in this configuration won’t help, it will just cause expensive
>> merges later.
>> 
>> I would recommend updating from Solr 4.4. that is a very early release for
>> Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early
>> releases, the
>> replicas actually did more indexing work than the leader.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Oct 28, 2018, at 2:13 PM, Erick Erickson 
>> wrote:
>>> 
>>> Well, if you optimize on the master you'll inevitably copy the entire
>>> index to each of the slaves. Consuming that much network bandwidth can
>>> be A Bad Thing.
>>> 
>>> Here's the background for Walter's comment:
>>> 
>> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
>>> 
>>> Solr 7.5 is much better about this:
>>> 
>> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
>>> 
>>> Even with the improvements in Solr 7.5, optimize is still a very
>>> expensive operation and unless you've measured and can _prove_ it's
>>> beneficial enough to be worth the cost you should avoid it.
>>> 
>>> Best,
>>> Erick
>>> On Sun, Oct 28, 2018 at 1:51 PM Parag Shah 
>> wrote:
 
 What would you do if your performance is degrading?
 
 I am not suggesting doing this for a serving index. Only one at the
>> Master,
 which ones optimized gets replicated. Am I missing something here?
 
 On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood <
>> wun...@wunderwood.org>
 wrote:
 
> Do not run optimize (force merge) unless you really understand the
> downside.
> 
> If you are continually adding and deleting documents, you really do not
> want
> to run optimize.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Oct 28, 2018, at 9:24 AM, Parag Shah 
>> wrote:
>> 
>> Hi Mugeesh,
>> 
>>  Have you tried optimizing indexes to see if performance improves? It
> is
>> well known that over time as indexing goes on lucene creates more
> segments
>> which will be  searched over and hence take longer. Merging happens
>> constantly but continuous indexing will still introduce smaller
>> segments
>> all the time. Have your tried running "optimize" periodically. Is it
>> something that you can afford to run? If you have a Master-Slave setup
> for
>> Indexer v/s searchers, you can replicate on optimize in the Master,
> thereby
>> removing the optimize load on the searchers, but replicate to the
> searcher
>> periodically. That might help with reducing latency. Optimize merges
>> segments and hence creates a more compact index that is faster to
>> search.
>> It may involve some higher latency temporarily right after the
> replication,
>> but will go away soon after in-memory caches are full.
>> 
>>  What is the search count/sec you are seeing?
>> 
>> Regards
>> Parag
>> 
>> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> wrote:
>> 
>>> Hi,
>>> 
>>> We are running 3 node solr cloud(4.4) in our production
>> infrastructure,
> We
>>> recently moved our SOLR server host softlayer to digital ocean server
> with
>>> same configuration as production.
>>> 
>>> Now we are facing some slowness in the searcher when we index
>> document,
>>> when
>>> we stop indexing then searches is fine, while adding document then it
>>> become
>>> slow. one of solr server we are indexing other 2 for searching the
> request.
>>> 
>>> 
>>> I am just wondering what was the reason searches become slow while
> indexing
>>> even we are using same configuration as we had in prod?
>>> 
>>> at the time we are pushing 500 document at a time, this processing is
>>> continuously running(adding & deleting)
>>> 
>>> these are the indexing logs
>>> 
>>> 65497339 [http-apr-8980-exec-45] INFO
>>> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0]
> webapp=/solr
>>> path=/update
>>> params={distrib.from=
>>> 
> 
>> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&

Re: searching is slow while adding document each time

2018-10-28 Thread Erick Erickson
Put a profiler on it and see where the hot spots are?
On Sun, Oct 28, 2018 at 8:27 PM Walter Underwood  wrote:
>
> Upgrade, so that indexing isn’t using as much CPU. That leaves more CPU for 
> search.
>
> Make sure you are on a recent release of Java. Run the G1 collector.
>
> If you need more throughput, add more replicas or use instance with more CPUs.
>
> Has the index gotten bigger since the move?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 28, 2018, at 8:21 PM, Parag Shah  wrote:
> >
> > The original question though is about performance issue in the Searcher.
> > How would you improve that?
> >
> > On Sun, Oct 28, 2018 at 4:37 PM Walter Underwood 
> > wrote:
> >
> >> The original question is for a three-node Solr Cloud cluster with
> >> continuous updates.
> >> Optimize in this configuration won’t help, it will just cause expensive
> >> merges later.
> >>
> >> I would recommend updating from Solr 4.4. that is a very early release for
> >> Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early
> >> releases, the
> >> replicas actually did more indexing work than the leader.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Oct 28, 2018, at 2:13 PM, Erick Erickson 
> >> wrote:
> >>>
> >>> Well, if you optimize on the master you'll inevitably copy the entire
> >>> index to each of the slaves. Consuming that much network bandwidth can
> >>> be A Bad Thing.
> >>>
> >>> Here's the background for Walter's comment:
> >>>
> >> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> >>>
> >>> Solr 7.5 is much better about this:
> >>>
> >> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >>>
> >>> Even with the improvements in Solr 7.5, optimize is still a very
> >>> expensive operation and unless you've measured and can _prove_ it's
> >>> beneficial enough to be worth the cost you should avoid it.
> >>>
> >>> Best,
> >>> Erick
> >>> On Sun, Oct 28, 2018 at 1:51 PM Parag Shah 
> >> wrote:
> 
>  What would you do if your performance is degrading?
> 
>  I am not suggesting doing this for a serving index. Only one at the
> >> Master,
>  which ones optimized gets replicated. Am I missing something here?
> 
>  On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood <
> >> wun...@wunderwood.org>
>  wrote:
> 
> > Do not run optimize (force merge) unless you really understand the
> > downside.
> >
> > If you are continually adding and deleting documents, you really do not
> > want
> > to run optimize.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Oct 28, 2018, at 9:24 AM, Parag Shah 
> >> wrote:
> >>
> >> Hi Mugeesh,
> >>
> >>  Have you tried optimizing indexes to see if performance improves? It
> > is
> >> well known that over time as indexing goes on lucene creates more
> > segments
> >> which will be  searched over and hence take longer. Merging happens
> >> constantly but continuous indexing will still introduce smaller
> >> segments
> >> all the time. Have your tried running "optimize" periodically. Is it
> >> something that you can afford to run? If you have a Master-Slave setup
> > for
> >> Indexer v/s searchers, you can replicate on optimize in the Master,
> > thereby
> >> removing the optimize load on the searchers, but replicate to the
> > searcher
> >> periodically. That might help with reducing latency. Optimize merges
> >> segments and hence creates a more compact index that is faster to
> >> search.
> >> It may involve some higher latency temporarily right after the
> > replication,
> >> but will go away soon after in-memory caches are full.
> >>
> >>  What is the search count/sec you are seeing?
> >>
> >> Regards
> >> Parag
> >>
> >> On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
> > wrote:
> >>
> >>> Hi,
> >>>
> >>> We are running 3 node solr cloud(4.4) in our production
> >> infrastructure,
> > We
> >>> recently moved our SOLR server host softlayer to digital ocean server
> > with
> >>> same configuration as production.
> >>>
> >>> Now we are facing some slowness in the searcher when we index
> >> document,
> >>> when
> >>> we stop indexing then searches is fine, while adding document then it
> >>> become
> >>> slow. one of solr server we are indexing other 2 for searching the
> > request.
> >>>
> >>>
> >>> I am just wondering what was the reason searches become slow while
> > indexing
> >>> even we are using same configuration as we had in prod?
> >>>
> >>> at the time we are pushing 500 document at a time, this processing is
> 

Re: searching is slow while adding document each time

2018-10-28 Thread Walter Underwood
Do you really think running a profiler on 4.4 will be more effective than 
upgrading to 7.x?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 28, 2018, at 8:53 PM, Erick Erickson  wrote:
> 
> Put a profiler on it and see where the hot spots are?
> On Sun, Oct 28, 2018 at 8:27 PM Walter Underwood  
> wrote:
>> 
>> Upgrade, so that indexing isn’t using as much CPU. That leaves more CPU for 
>> search.
>> 
>> Make sure you are on a recent release of Java. Run the G1 collector.
>> 
>> If you need more throughput, add more replicas or use instance with more 
>> CPUs.
>> 
>> Has the index gotten bigger since the move?
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Oct 28, 2018, at 8:21 PM, Parag Shah  wrote:
>>> 
>>> The original question though is about performance issue in the Searcher.
>>> How would you improve that?
>>> 
>>> On Sun, Oct 28, 2018 at 4:37 PM Walter Underwood 
>>> wrote:
>>> 
 The original question is for a three-node Solr Cloud cluster with
 continuous updates.
 Optimize in this configuration won’t help, it will just cause expensive
 merges later.
 
 I would recommend updating from Solr 4.4. that is a very early release for
 Solr Cloud. We saw dramatic speedups in indexing with 6.x. In early
 releases, the
 replicas actually did more indexing work than the leader.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
> On Oct 28, 2018, at 2:13 PM, Erick Erickson 
 wrote:
> 
> Well, if you optimize on the master you'll inevitably copy the entire
> index to each of the slaves. Consuming that much network bandwidth can
> be A Bad Thing.
> 
> Here's the background for Walter's comment:
> 
 https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> 
> Solr 7.5 is much better about this:
> 
 https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> 
> Even with the improvements in Solr 7.5, optimize is still a very
> expensive operation and unless you've measured and can _prove_ it's
> beneficial enough to be worth the cost you should avoid it.
> 
> Best,
> Erick
> On Sun, Oct 28, 2018 at 1:51 PM Parag Shah 
 wrote:
>> 
>> What would you do if your performance is degrading?
>> 
>> I am not suggesting doing this for a serving index. Only one at the
 Master,
>> which ones optimized gets replicated. Am I missing something here?
>> 
>> On Sun, Oct 28, 2018 at 11:05 AM Walter Underwood <
 wun...@wunderwood.org>
>> wrote:
>> 
>>> Do not run optimize (force merge) unless you really understand the
>>> downside.
>>> 
>>> If you are continually adding and deleting documents, you really do not
>>> want
>>> to run optimize.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Oct 28, 2018, at 9:24 AM, Parag Shah 
 wrote:
 
 Hi Mugeesh,
 
 Have you tried optimizing indexes to see if performance improves? It
>>> is
 well known that over time as indexing goes on lucene creates more
>>> segments
 which will be  searched over and hence take longer. Merging happens
 constantly but continuous indexing will still introduce smaller
 segments
 all the time. Have your tried running "optimize" periodically. Is it
 something that you can afford to run? If you have a Master-Slave setup
>>> for
 Indexer v/s searchers, you can replicate on optimize in the Master,
>>> thereby
 removing the optimize load on the searchers, but replicate to the
>>> searcher
 periodically. That might help with reducing latency. Optimize merges
 segments and hence creates a more compact index that is faster to
 search.
 It may involve some higher latency temporarily right after the
>>> replication,
 but will go away soon after in-memory caches are full.
 
 What is the search count/sec you are seeing?
 
 Regards
 Parag
 
 On Wed, Sep 26, 2018 at 2:02 AM Mugeesh Husain 
>>> wrote:
 
> Hi,
> 
> We are running 3 node solr cloud(4.4) in our production
 infrastructure,
>>> We
> recently moved our SOLR server host softlayer to digital ocean server
>>> with
> same configuration as production.
> 
> Now we are facing some slowness in the searcher when we index
 document,
> when
> we stop indexing then searches is fine, while adding document then it
> become
> slow. one of solr server we are indexing other 2 for searchin

Re: searching is slow while adding document each time

2018-10-28 Thread Deepak Goel
What are your hardware utilisations (cpu, memory, disk, network)?

I think you might have to tune lucene too

On Wed, 26 Sep 2018, 14:33 Mugeesh Husain,  wrote:

> Hi,
>
> We are running 3 node solr cloud(4.4) in our production infrastructure, We
> recently moved our SOLR server host softlayer to digital ocean server with
> same configuration as production.
>
> Now we are facing some slowness in the searcher when we index document,
> when
> we stop indexing then searches is fine, while adding document then it
> become
> slow. one of solr server we are indexing other 2 for searching the request.
>
>
> I am just wondering what was the reason searches become slow while indexing
> even we are using same configuration as we had in prod?
>
> at the time we are pushing 500 document at a time, this processing is
> continuously running(adding & deleting)
>
> these are the indexing logs
>
> 65497339 [http-apr-8980-exec-45] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> D1E52788A466E484 (1612655281636900864)]} 0 9
> 65497459 [http-apr-8980-exec-22] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> 50EF977E5E873065 (1612655281759584256)]} 0 9
> 65497572 [http-apr-8980-exec-40] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> 59C4A764BB50B13B (1612655281880170496)]} 0 9
> 65497724 [http-apr-8980-exec-31] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> path=/update
> params={distrib.from=
> http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> }
> {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
> (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
> AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
> (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
> 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
> (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
> 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: searching is slow while adding document each time

2018-10-28 Thread Erick Erickson
bq. Do you really think running a profiler on 4.4 will be more
effective than upgrading to 7.x?

No but it's better than random speculation.
On Sun, Oct 28, 2018 at 9:34 PM Deepak Goel  wrote:
>
> What are your hardware utilisations (cpu, memory, disk, network)?
>
> I think you might have to tune lucene too
>
> On Wed, 26 Sep 2018, 14:33 Mugeesh Husain,  wrote:
>
> > Hi,
> >
> > We are running 3 node solr cloud(4.4) in our production infrastructure, We
> > recently moved our SOLR server host softlayer to digital ocean server with
> > same configuration as production.
> >
> > Now we are facing some slowness in the searcher when we index document,
> > when
> > we stop indexing then searches is fine, while adding document then it
> > become
> > slow. one of solr server we are indexing other 2 for searching the request.
> >
> >
> > I am just wondering what was the reason searches become slow while indexing
> > even we are using same configuration as we had in prod?
> >
> > at the time we are pushing 500 document at a time, this processing is
> > continuously running(adding & deleting)
> >
> > these are the indexing logs
> >
> > 65497339 [http-apr-8980-exec-45] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> > }
> > {add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
> > (1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
> > B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
> > (1612655281566646272), 8D15813305BF7417 (1612655281584472064),
> > DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
> > (1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
> > D1E52788A466E484 (1612655281636900864)]} 0 9
> > 65497459 [http-apr-8980-exec-22] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> > }
> > {add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
> > (1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
> > 1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
> > (1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
> > 68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
> > (1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
> > 50EF977E5E873065 (1612655281759584256)]} 0 9
> > 65497572 [http-apr-8980-exec-40] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> > }
> > {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> > (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> > 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> > (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> > 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> > (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> > 59C4A764BB50B13B (1612655281880170496)]} 0 9
> > 65497724 [http-apr-8980-exec-31] INFO
> > org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
> > path=/update
> > params={distrib.from=
> > http://solrhost:8980/solr/rn0/&update.distrib=FROMLEADER&wt=javabin&version=2&update.chain=dedupe
> > }
> > {add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
> > (1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
> > AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
> > (1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
> > 9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
> > (1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
> > 5131AEC4B87FBFE9 (1612655282037456896)]} 0 10
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >