Replacing Double Quotes from a field

2018-09-03 Thread Gopesh Sharma
Hello All,

I am trying to remove the double quotes from a field and that's why written 
PatternReplaceCharFilterFactory, but it doesn't seem to be working.



I also tried to replace it on querying time, but the SOLR throwing error that 
entity must be closed with >

select replace(t.name, '\"', '') as NAME from wp_31_term_taxonomy

Please let me know if I am doing anything wrong.


Re: Is that a mistake or bug?

2018-09-03 Thread Mikhail Khludnev
Nope. In this case, it will respond terminatedEarly=false even if noone
request it.

On Mon, Sep 3, 2018 at 9:09 AM zhenyuan wei  wrote:

> Yeah,got it~. So the QueryResult.segmentTerminatedEarly maybe a boolean,
> instead of Boolean,  is better, right?
>
> Mikhail Khludnev  于2018年9月3日周一 下午1:36写道:
>
> > It's neither, it's on purpose. By default  result.segmentTerminatedEarly
> is
> > null, hence it doesn't appear in result output. see
> > ResponseBuilder.setResult(QueryResult).
> > So, if cmd requests early termination, it sets false by default, enabling
> > "false" output even it won't be the case. And later it might be flipped
> to
> > true.
> >
> >
> > On Mon, Sep 3, 2018 at 5:57 AM zhenyuan wei  wrote:
> >
> > > Hi all,
> > > I saw the code like following:
> > >
> > > QueryResult result = new QueryResult();
> > >
> > >
> > >
> >
> cmd.setSegmentTerminateEarly(params.getBool(CommonParams.SEGMENT_TERMINATE_EARLY,
> > > CommonParams.SEGMENT_TERMINATE_EARLY_DEFAULT));
> > > if (cmd.getSegmentTerminateEarly()) {
> > >   result.setSegmentTerminatedEarly(Boolean.FALSE);
> > > }
> > >
> > > It says if request's param segmentTerminateEarly=true, which means
> search
> > > maybe terminated early within a segment,  then set
> > > result.setSegmentTerminatedEarly as false , this code is of a little
> > > confusion
> > > .
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: change DocExpirationUpdateProcessorFactory deleteByQuery NOW parameter time zone

2018-09-03 Thread Derek Poh

SG refers to Singaporeand the time is UTC +8.

That means I need to set the P_TradeShowOnlineEndDate date to UTC 
instead of UTC +8 as a workaround to it.


On 31/8/2018 10:16 PM, Shawn Heisey wrote:

On 8/30/2018 7:26 PM, Derek Poh wrote:
Can the timezone of the NOW parameter in the |deleteByQuery| of the 
DocExpirationUpdateProcessorFactory be change to my timezone?


I am in SG and using solr 6.5.1.


I do not know what SG is.

The timezone cannot be changed.  Solr *always* handles dates in UTC.  
You can assign a timezone when doing date math, but this is only used 
to determine when a new day or week starts -- the dates themselves 
will be in UTC.


Thanks,
Shawn





--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Re: Is that a mistake or bug?

2018-09-03 Thread zhenyuan wei
I mean, use terminatedEarly as basic boolean type, then  no need to explicitly
assign it as Boolean.FALSE,  because basic boolean's default value is false.

Mikhail Khludnev  于2018年9月3日周一 下午4:13写道:

> Nope. In this case, it will respond terminatedEarly=false even if noone
> request it.
>
> On Mon, Sep 3, 2018 at 9:09 AM zhenyuan wei  wrote:
>
> > Yeah,got it~. So the QueryResult.segmentTerminatedEarly maybe a boolean,
> > instead of Boolean,  is better, right?
> >
> > Mikhail Khludnev  于2018年9月3日周一 下午1:36写道:
> >
> > > It's neither, it's on purpose. By default
> result.segmentTerminatedEarly
> > is
> > > null, hence it doesn't appear in result output. see
> > > ResponseBuilder.setResult(QueryResult).
> > > So, if cmd requests early termination, it sets false by default,
> enabling
> > > "false" output even it won't be the case. And later it might be flipped
> > to
> > > true.
> > >
> > >
> > > On Mon, Sep 3, 2018 at 5:57 AM zhenyuan wei  wrote:
> > >
> > > > Hi all,
> > > > I saw the code like following:
> > > >
> > > > QueryResult result = new QueryResult();
> > > >
> > > >
> > > >
> > >
> >
> cmd.setSegmentTerminateEarly(params.getBool(CommonParams.SEGMENT_TERMINATE_EARLY,
> > > > CommonParams.SEGMENT_TERMINATE_EARLY_DEFAULT));
> > > > if (cmd.getSegmentTerminateEarly()) {
> > > >   result.setSegmentTerminatedEarly(Boolean.FALSE);
> > > > }
> > > >
> > > > It says if request's param segmentTerminateEarly=true, which means
> > search
> > > > maybe terminated early within a segment,  then set
> > > > result.setSegmentTerminatedEarly as false , this code is of a little
> > > > confusion
> > > > .
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Is that a mistake or bug?

2018-09-03 Thread p.bodnar
Hi, really nope :) Because as MK writes below, result.segmentTerminatedEarly is 
used as a 3-state variable.

The only line that could be improved, is probably replacing "Boolean.FALSE" by 
simply "false", but that is really a minor thing...

Regards

PB
__
> Od: "zhenyuan wei" 
> Komu: solr-user@lucene.apache.org
> Datum: 03.09.2018 10:24
> Předmět: Re: Is that a mistake or bug?
>
>I mean, use terminatedEarly as basic boolean type, then  no need to explicitly
>assign it as Boolean.FALSE,  because basic boolean's default value is false.
>
>Mikhail Khludnev  于2018年9月3日周一 下午4:13写道:
>
>> Nope. In this case, it will respond terminatedEarly=false even if noone
>> request it.
>>
>> On Mon, Sep 3, 2018 at 9:09 AM zhenyuan wei  wrote:
>>
>> > Yeah,got it~. So the QueryResult.segmentTerminatedEarly maybe a boolean,
>> > instead of Boolean,  is better, right?
>> >
>> > Mikhail Khludnev  于2018年9月3日周一 下午1:36写道:
>> >
>> > > It's neither, it's on purpose. By default
>> result.segmentTerminatedEarly
>> > is
>> > > null, hence it doesn't appear in result output. see
>> > > ResponseBuilder.setResult(QueryResult).
>> > > So, if cmd requests early termination, it sets false by default,
>> enabling
>> > > "false" output even it won't be the case. And later it might be flipped
>> > to
>> > > true.
>> > >
>> > >
>> > > On Mon, Sep 3, 2018 at 5:57 AM zhenyuan wei  wrote:
>> > >
>> > > > Hi all,
>> > > > I saw the code like following:
>> > > >
>> > > > QueryResult result = new QueryResult();
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> cmd.setSegmentTerminateEarly(params.getBool(CommonParams.SEGMENT_TERMINATE_EARLY,
>> > > > CommonParams.SEGMENT_TERMINATE_EARLY_DEFAULT));
>> > > > if (cmd.getSegmentTerminateEarly()) {
>> > > >   result.setSegmentTerminatedEarly(Boolean.FALSE);
>> > > > }
>> > > >
>> > > > It says if request's param segmentTerminateEarly=true, which means
>> > search
>> > > > maybe terminated early within a segment,  then set
>> > > > result.setSegmentTerminatedEarly as false , this code is of a little
>> > > > confusion
>> > > > .
>> > > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours
>> > > Mikhail Khludnev
>> > >
>> >
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>
>


Re: Is that a mistake or bug?

2018-09-03 Thread zhenyuan wei
Oh ~ I feel embarrassed to  explaining it again, maybe my english not so
well~
my actually mean is:   IF  QueryResult.segmentTerminatedEarly  is boolean
,not Boolean , declared in QueryResult.
public class QueryResult{
   private boolean partialResults
  * private Boolean segmentTerminatedEarly;  >  private boolean
segmentTerminatedEarly;*
   ..
}

then  in QueryComponent.process() method, like follow :

QueryResult result = new QueryResult();
cmd.setSegmentTerminateEarly(params.getBool(CommonParams.SEGMENT_TERMINATE_EARLY,
CommonParams.SEGMENT_TERMINATE_EARLY_DEFAULT));


*if (cmd.getSegmentTerminateEarly()) { // this if block code can be
deleted .result.setSegmentTerminatedEarly(Boolean.FALSE); } *







 于2018年9月3日周一 下午4:52写道:

> Hi, really nope :) Because as MK writes below,
> result.segmentTerminatedEarly is used as a 3-state variable.
>
> The only line that could be improved, is probably replacing
> "Boolean.FALSE" by simply "false", but that is really a minor thing...
>
> Regards
>
> PB
> __
> > Od: "zhenyuan wei" 
> > Komu: solr-user@lucene.apache.org
> > Datum: 03.09.2018 10:24
> > Předmět: Re: Is that a mistake or bug?
> >
> >I mean, use terminatedEarly as basic boolean type, then  no need to
> explicitly
> >assign it as Boolean.FALSE,  because basic boolean's default value is
> false.
> >
> >Mikhail Khludnev  于2018年9月3日周一 下午4:13写道:
> >
> >> Nope. In this case, it will respond terminatedEarly=false even if noone
> >> request it.
> >>
> >> On Mon, Sep 3, 2018 at 9:09 AM zhenyuan wei  wrote:
> >>
> >> > Yeah,got it~. So the QueryResult.segmentTerminatedEarly maybe a
> boolean,
> >> > instead of Boolean,  is better, right?
> >> >
> >> > Mikhail Khludnev  于2018年9月3日周一 下午1:36写道:
> >> >
> >> > > It's neither, it's on purpose. By default
> >> result.segmentTerminatedEarly
> >> > is
> >> > > null, hence it doesn't appear in result output. see
> >> > > ResponseBuilder.setResult(QueryResult).
> >> > > So, if cmd requests early termination, it sets false by default,
> >> enabling
> >> > > "false" output even it won't be the case. And later it might be
> flipped
> >> > to
> >> > > true.
> >> > >
> >> > >
> >> > > On Mon, Sep 3, 2018 at 5:57 AM zhenyuan wei 
> wrote:
> >> > >
> >> > > > Hi all,
> >> > > > I saw the code like following:
> >> > > >
> >> > > > QueryResult result = new QueryResult();
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> cmd.setSegmentTerminateEarly(params.getBool(CommonParams.SEGMENT_TERMINATE_EARLY,
> >> > > > CommonParams.SEGMENT_TERMINATE_EARLY_DEFAULT));
> >> > > > if (cmd.getSegmentTerminateEarly()) {
> >> > > >   result.setSegmentTerminatedEarly(Boolean.FALSE);
> >> > > > }
> >> > > >
> >> > > > It says if request's param segmentTerminateEarly=true, which means
> >> > search
> >> > > > maybe terminated early within a segment,  then set
> >> > > > result.setSegmentTerminatedEarly as false , this code is of a
> little
> >> > > > confusion
> >> > > > .
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Sincerely yours
> >> > > Mikhail Khludnev
> >> > >
> >> >
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >>
> >
> >
>


Re: MLT in Cloud Mode - Not Returning Fields?

2018-09-03 Thread Charlie Hull

On 31/08/2018 19:36, Doug Turnbull wrote:

Hello,

We're working on a Solr More Like This project (Solr 6.6.2), using the More
Like This searchComponent. What we note is in standalone Solr, when we
request MLT using the search component, we get every more like this
document fully formed with complete fields in the moreLikeThis section.


Hey Doug,

IIRC there wasn't a lot of support for MLT in cloud mode a few years 
ago, and there are certainly still a few open issues around cloud support:

https://issues.apache.org/jira/browse/SOLR-4414
https://issues.apache.org/jira/browse/SOLR-5480
Maybe there are some hints in the ticket comments about different ways 
to do what you want.


Cheers

Charlie



In cloud, however, with the exact same query and config, we only get the
doc ids under "moreLikeThis" requiring us to fetch the metadata associated
with each document.

I can't easily share an example due to confidentiality, but I want to check
if we're missing something? Documentation doesn't mention any limitations.
The only interesting note I've found is this one which points to a
potential difference in behavior


  The Cloud MLT Query Parser uses the realtime get handler to retrieve the

fields to be mined for keywords. Because of the way the realtime get
handler is implemented, it does not return data for fields populated using
copyField.

https://stackoverflow.com/a/46307140/8123

Any thoughts?

-Doug




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Multiple solr instances per host vs Multiple cores in same solr instance

2018-09-03 Thread Toke Eskildsen
On Tue, 2018-08-28 at 09:37 +0200, Bernd Fehling wrote:
> Yes, I tested many cases.

Erick is absolutely right about the challenge of finding "best" setups.
What we can do is gather observations, as you have done, and hope that
people with similar use cases finds them. With that in mind, have you
considered posting a write-up of your hard work somewhere? It seems a
shame only to have is as an input on this mailing list.

- Toke Eskildsen, Royal Danish Library



Storing PID below /run

2018-09-03 Thread Andreas Hubold

Hi,

we'd like to store the PID file for the Solr service in a directory 
below the /run directory (CentOS 7.5).


I've set "SOLR_PID_DIR=/run/solr" in solr.in.sh. But if /run is mounted 
as tmpfs, the directory /run/solr will not exist after boot and the pid 
file cannot be stored in that directory. I can't use /run instead 
(without a subdirectory), because /run is only writable by the root user.


It would be nice if the Solr init script could create the SOLR_PID_DIR 
with permissions for the Solr user if it does not exist. Want do you 
think? I could create a JIRA ticket with a simple patch if nobody comes 
up with a better idea.


Thanks in advance!
- Andreas




Boost only first 10 records

2018-09-03 Thread mama
Hi 
We have requirement to boost only first few records & rest of result should
be as per search.
e.g. if i have books of different genre & if user search for some book
(intrested in genere : comedy) then 
we want to show say first 3 records of genre:comedy and rest of results
should be of diff genre .
Reason for this is , we have lots of books in db , if we boost comedy genre
then first 100s of records will be comedy and user may not be aware of other
books.
is it possible ?

Query for boosting genre comedy
 genre:comedy^0.5

can someone help with requirement of limiting boost to first few records ?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Multiple solr instances per host vs Multiple cores in same solr instance

2018-09-03 Thread Bernd Fehling

Yes thats right, there is no "best" setup at all, only one that
gives most advantage to your requirements.
And any setup has some disadvantages.

Currently I'm short in time and have to bring our Cloud to production
but a write-up is in the queue as already done with other developments.
https://www.ub.uni-bielefeld.de/~befehl/base/solr/index.html

Regards
Bernd


Am 03.09.2018 um 11:33 schrieb Toke Eskildsen:

On Tue, 2018-08-28 at 09:37 +0200, Bernd Fehling wrote:

Yes, I tested many cases.


Erick is absolutely right about the challenge of finding "best" setups.
What we can do is gather observations, as you have done, and hope that
people with similar use cases finds them. With that in mind, have you
considered posting a write-up of your hard work somewhere? It seems a
shame only to have is as an input on this mailing list.

- Toke Eskildsen, Royal Danish Library



Re: Boost only first 10 records

2018-09-03 Thread Emir Arnautović
Hi,
The requirement is not 100% clear or logical. If user selects filter 
type:comedy, it does not make sense to show anything else. You might have 
“Other categories relavant results” and that can be done as a separate query. 
It seems that you want to prefer comedy, but you have an issue with boosting it 
too much results in only comedy top results and boosting it too little does not 
result in comedy being top hit all the time. Boosting is usually used to prefer 
one type if there are similar results but that does not guaranty that they will 
be top all the time. Your options are:
1. tune boost parameter so the results are as expected in most times (it will 
never be all the times)
2. use collapse (group) feature to make sure you get results from all categories
3. have two queries and combine results on UI side
4. use faceting in combination with query and let user choose genre.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 3 Sep 2018, at 08:48, mama  wrote:
> 
> Hi 
> We have requirement to boost only first few records & rest of result should
> be as per search.
> e.g. if i have books of different genre & if user search for some book
> (intrested in genere : comedy) then 
> we want to show say first 3 records of genre:comedy and rest of results
> should be of diff genre .
> Reason for this is , we have lots of books in db , if we boost comedy genre
> then first 100s of records will be comedy and user may not be aware of other
> books.
> is it possible ?
> 
> Query for boosting genre comedy
> genre:comedy^0.5
> 
> can someone help with requirement of limiting boost to first few records ?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Boost only first 10 records

2018-09-03 Thread Rahul Singh
I agree , the tow query solution is the simplest to implement and you have much 
more control on the UI as well. It seems you want to have a “featured” set of 
results above and separate from the organic results from the index.

You could choose to request only specific fields in the “featured” query.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Sep 3, 2018, 6:29 AM -0400, Emir Arnautović , 
wrote:
> Hi,
> The requirement is not 100% clear or logical. If user selects filter 
> type:comedy, it does not make sense to show anything else. You might have 
> “Other categories relavant results” and that can be done as a separate query. 
> It seems that you want to prefer comedy, but you have an issue with boosting 
> it too much results in only comedy top results and boosting it too little 
> does not result in comedy being top hit all the time. Boosting is usually 
> used to prefer one type if there are similar results but that does not 
> guaranty that they will be top all the time. Your options are:
> 1. tune boost parameter so the results are as expected in most times (it will 
> never be all the times)
> 2. use collapse (group) feature to make sure you get results from all 
> categories
> 3. have two queries and combine results on UI side
> 4. use faceting in combination with query and let user choose genre.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 3 Sep 2018, at 08:48, mama  wrote:
> >
> > Hi
> > We have requirement to boost only first few records & rest of result should
> > be as per search.
> > e.g. if i have books of different genre & if user search for some book
> > (intrested in genere : comedy) then
> > we want to show say first 3 records of genre:comedy and rest of results
> > should be of diff genre .
> > Reason for this is , we have lots of books in db , if we boost comedy genre
> > then first 100s of records will be comedy and user may not be aware of other
> > books.
> > is it possible ?
> >
> > Query for boosting genre comedy
> > genre:comedy^0.5
> >
> > can someone help with requirement of limiting boost to first few records ?
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


AW: Solr suggestions: why are exact matches omitted

2018-09-03 Thread Clemens Wyss DEV
Sorry for not giving up on this issue:
is this "behavior" a feature or a bug?

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV  
Gesendet: Donnerstag, 30. August 2018 18:01
An: 'solr-user@lucene.apache.org' 
Betreff: Solr suggestions: why are exact matches omitted

Given the following configuration:
...


suggest_word_fuzzy
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FuzzyLookupFactory
true
_my_suggest_word
2
0.01
.01
suggest_word 

false 
false 
true 


...
When I try to find suggestions for "11000.35" I get 
"11000.33"
"11000.34"
"11000.36"
"11000.37"
...
but not "11000.35", although "11000.35" exists (and is suggested when I for 
example type "11000.34")

Thx in advance
- Clemens


Re: Boost only first 10 records

2018-09-03 Thread Mikhail Khludnev
Hello,

I hardly follow, but subj sounds like reranking.
http://people.apache.org/~mkhl/searchable-solr-guide-7-3/query-re-ranking.html#query-re-ranking


On Mon, Sep 3, 2018 at 1:02 PM mama  wrote:

> Hi
> We have requirement to boost only first few records & rest of result should
> be as per search.
> e.g. if i have books of different genre & if user search for some book
> (intrested in genere : comedy) then
> we want to show say first 3 records of genre:comedy and rest of results
> should be of diff genre .
> Reason for this is , we have lots of books in db , if we boost comedy genre
> then first 100s of records will be comedy and user may not be aware of
> other
> books.
> is it possible ?
>
> Query for boosting genre comedy
>  genre:comedy^0.5
>
> can someone help with requirement of limiting boost to first few records ?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr suggestions: why are exact matches omitted

2018-09-03 Thread Mikhail Khludnev
I'm afraid only thorough debugging might answer.

On Mon, Sep 3, 2018 at 1:58 PM Clemens Wyss DEV 
wrote:

> Sorry for not giving up on this issue:
> is this "behavior" a feature or a bug?
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV 
> Gesendet: Donnerstag, 30. August 2018 18:01
> An: 'solr-user@lucene.apache.org' 
> Betreff: Solr suggestions: why are exact matches omitted
>
> Given the following configuration:
> ...
> 
> 
> suggest_word_fuzzy
>  name="classname">org.apache.solr.spelling.suggest.Suggester
>  name="lookupImpl">org.apache.solr.spelling.suggest.fst.FuzzyLookupFactory
> true
> _my_suggest_word
> 2
> 0.01
> .01 
> suggest_word
> 
> 
> false 
> false 
> true 
> 
> 
> ...
> When I try to find suggestions for "11000.35" I get
> "11000.33"
> "11000.34"
> "11000.36"
> "11000.37"
> ...
> but not "11000.35", although "11000.35" exists (and is suggested when I
> for example type "11000.34")
>
> Thx in advance
> - Clemens
>


-- 
Sincerely yours
Mikhail Khludnev


How long does a query?q=field1:2312 should cost? exactly hit one document.

2018-09-03 Thread zhenyuan wei
Hi ,
   I am curious “How long does a  query q=field1:2312 cost ,   which
exactly match only one document? ”,  Of course we just discuss  no
 queryResultCache with match in this situation.
   In fact  my QTime is  150ms+, it is too long.


AW: Solr suggestions: why are exact matches omitted

2018-09-03 Thread Clemens Wyss DEV
> I'm afraid only thorough debugging might answer
I'd say debugging is only required if everybody (not just me 😉) expects  to get 
"the exact match" in the spellcheck-response ... If it's nonsense to expect 
"the exact match" in the spellcheck-respons, then it's a feature of 
spellchecking

-Ursprüngliche Nachricht-
Von: Mikhail Khludnev  
Gesendet: Montag, 3. September 2018 13:17
An: solr-user 
Betreff: Re: Solr suggestions: why are exact matches omitted

I'm afraid only thorough debugging might answer.

On Mon, Sep 3, 2018 at 1:58 PM Clemens Wyss DEV 
wrote:

> Sorry for not giving up on this issue:
> is this "behavior" a feature or a bug?
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV 
> Gesendet: Donnerstag, 30. August 2018 18:01
> An: 'solr-user@lucene.apache.org' 
> Betreff: Solr suggestions: why are exact matches omitted
>
> Given the following configuration:
> ...
> 
> 
> suggest_word_fuzzy
>  name="classname">org.apache.solr.spelling.suggest.Suggester
>  name="lookupImpl">org.apache.solr.spelling.suggest.fst.FuzzyLookupFactory
> true
> _my_suggest_word
> 2
> 0.01
> .01 
>  name="suggestAnalyzerFieldType">suggest_word
> 
> 
> false 
> false 
> true 
> 
> 
> ...
> When I try to find suggestions for "11000.35" I get "11000.33"
> "11000.34"
> "11000.36"
> "11000.37"
> ...
> but not "11000.35", although "11000.35" exists (and is suggested when 
> I for example type "11000.34")
>
> Thx in advance
> - Clemens
>


--
Sincerely yours
Mikhail Khludnev


Re: How long does a query?q=field1:2312 should cost? exactly hit one document.

2018-09-03 Thread Erik Hatcher
Add debug=true and see where the time goes, in which components? 

Highlighting is my culprit guess.   Or faceting?

> On Sep 3, 2018, at 07:45, zhenyuan wei  wrote:
> 
> Hi ,
>   I am curious “How long does a  query q=field1:2312 cost ,   which
> exactly match only one document? ”,  Of course we just discuss  no
> queryResultCache with match in this situation.
>   In fact  my QTime is  150ms+, it is too long.


Re: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:421)

2018-09-03 Thread asis.ind...@gmail.com
Hi 

thanks for posting this, was getting same error and had same stored false
ID.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Streaming timeseries() and buckets with no docs

2018-09-03 Thread Jan Høydahl
Hi

We have a timeseries expression with gap="+1DAY" and a sum(imps_l) to aggregate 
sums of an integer for each bucket.
Now, some day buckets do not contain any documents at all, and instead of 
returning a tuple with value 0, it returns
a tuple with no entry at all for the sum, see the bucket for date_dt 2018-06-22 
below:

{
  "result-set": {
"docs": [
  {
"sum(imps_l)": 0,
"date_dt": "2018-06-21",
"count(*)": 5
  },
  {
"date_dt": "2018-06-22",
"count(*)": 0
  },
  {
"EOF": true,
"RESPONSE_TIME": 3
  }
]
  }
}


Now when we want to convert this into a column using col(a,'sum(imps_l)') then 
that array will get mostly numbers
but also some string entries 'sum(imps_l)' which is the key name. I need purely 
integers in the column.

Should the timeseries() have output values for all functions even if there are 
no documents in the bucket?
Or is there something similar to the select() expression that can take a stream 
of tuples not originating directly
from search() and replace values? Or is there perhaps a function that can loop 
through the column produced by col()
and replace non-numeric values with 0?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: How long does a query?q=field1:2312 should cost? exactly hit one document.

2018-09-03 Thread zhenyuan wei
Only a termQuery q=field1:2312, No other conditions.
I try debug now, but can not find out  what is the main cost.
Debug=timing output like :

{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":157,
"params":{
  "q":"v00_s:15de21c670ae7c3f6f3f1f37029303c9",
  "debug":"timing"}},
  "response":{"numFound":1,"start":0,"maxScore":17.099754,"docs":[
  {
"v00_s":"15de21c670ae7c3f6f3f1f37029303c9",
"v01_s":"7596295605015",
"v02_s":"Mp9XkmrRXhFChgMAGoydOvAD",
"v03_l":555,
"v04_s":"55",
"v05_s":"39994237071313698949",
"v06_s":"3",
"v07_s":"155",
"v08_s":"5",
"v09_s":"15",
"v10_s":"15",
"v11_s":"555",
"v12_s":"43819292",
"v13_s":"549754428",
"v14_s":"8111596961",
"id":"000555",
"_version_":1610106981630083079}]
  },
  "debug":{
"timing":{
  "time":6336.0,
  "prepare":{
"time":8.0,
"query":{
  "time":8.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":6270.0,
"query":{
  "time":6268.0,
  "doProcessSearchByIds":{
"time":0.0},
  "doProcessUngroupedSearch":{
"time":6265.0,
"search":{
  "time":6261.0,
  "getDocListC":{
"time":6261.0,
"lookup_queryResultCache":{
  "time":0.0},
"lookupNotuseFilterCacheTimer":{
  "time":6258.0},
"getDocListNCTimer":{
  "time":6258.0,
  "getProcessedFilter":{
"time":0.0},
  "buildAndRunCollectorChain2":{
"time":6255.0},
  "topDocs":{
"time":0.0,
"doFieldSortValues":{
  "time":0.0},
"doPrefetch":{
  "time":0.0}}},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}


My request is :  curl "
http://emr-worker-2:8983/solr/collection001/query?q=v00_s:
15de21c670ae7c3f6f3f1f37029303c9&debug=timing"

I  also hope to using debug=true to find out more things,so I added some
sub timer to trace which sub method is slowly.
And found , as above,  the "SolrIndexSearch.buildAndRunCollectorChain() "
cost a lot。
( if I want to  find out the answer, I think I have not idea but  debug
tracing  to deeper into lucene level method.)

*At this moment, I have another question too, why  debug time is 6336.0,
which less than QTime=157 ? *
















Erik Hatcher  于2018年9月3日周一 下午8:30写道:

> Add debug=true and see where the time goes, in which components?
>
> Highlighting is my culprit guess.   Or faceting?
>
> > On Sep 3, 2018, at 07:45, zhenyuan wei  wrote:
> >
> > Hi ,
> >   I am curious “How long does a  query q=field1:2312 cost ,   which
> > exactly match only one document? ”,  Of course we just discuss  no
> > queryResultCache with match in this situation.
> >   In fact  my QTime is  150ms+, it is too long.
>


Re: How long does a query?q=field1:2312 should cost? exactly hit one document.

2018-09-03 Thread zhenyuan wei
Only a termQuery q=field1:2312, No other conditions.
I try debug now, but can not find out  what is the main cost.
Debug=timing output like :

{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":157,
"params":{
  "q":"v00_s:15de21c670ae7c3f6f3f1f37029303c9",
  "debug":"timing"}},
  "response":{"numFound":1,"start":0,"maxScore":17.099754,"docs":[
  {
"v00_s":"15de21c670ae7c3f6f3f1f37029303c9",
"v01_s":"7596295605015",
"v02_s":"Mp9XkmrRXhFChgMAGoydOvAD",
"v03_l":555,
"v04_s":"55",
"v05_s":"39994237071313698949",
"v06_s":"3",
"v07_s":"155",
"v08_s":"5",
"v09_s":"15",
"v10_s":"15",
"v11_s":"555",
"v12_s":"43819292",
"v13_s":"549754428",
"v14_s":"8111596961",
"id":"000555",
"_version_":1610106981630083079}]
  },
  "debug":{
"timing":{
  "time":6336.0,
  "prepare":{
"time":8.0,
"query":{
  "time":8.0},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}},
  "process":{
"time":6270.0,
"query":{
  "time":6268.0,
  "doProcessSearchByIds":{
"time":0.0},
  "doProcessUngroupedSearch":{
"time":6265.0,
"search":{
  "time":6261.0,
  "getDocListC":{
"time":6261.0,
"lookup_queryResultCache":{
  "time":0.0},
"lookupNotuseFilterCacheTimer":{
  "time":6258.0},
"getDocListNCTimer":{
  "time":6258.0,
  "getProcessedFilter":{
"time":0.0},
  "buildAndRunCollectorChain2":{
"time":6255.0},
  "topDocs":{
"time":0.0,
"doFieldSortValues":{
  "time":0.0},
"doPrefetch":{
  "time":0.0}}},
"facet":{
  "time":0.0},
"facet_module":{
  "time":0.0},
"mlt":{
  "time":0.0},
"highlight":{
  "time":0.0},
"stats":{
  "time":0.0},
"expand":{
  "time":0.0},
"terms":{
  "time":0.0},
"debug":{
  "time":0.0}


My request is :  curl "
http://localhost:8983/solr/collection001/query?q=v00_s
:
15de21c670ae7c3f6f3f1f37029303c9&debug=timing"

I  also hope to using debug=true to find out more things,so I added some
sub timer to trace which sub method is slowly.
And found , as above,  the "SolrIndexSearch.buildAndRunCollectorChain() "
cost a lot。
( if I want to  find out the answer, I think I have not idea but  debug
tracing  to deeper into lucene level method.)

*At this moment, I have another question too, why  debug time is 6336.0,
which less than QTime=157 ?*

Erik Hatcher  于2018年9月3日周一 下午8:30写道:

> Add debug=true and see where the time goes, in which components?
>
> Highlighting is my culprit guess.   Or faceting?
>
> > On Sep 3, 2018, at 07:45, zhenyuan wei  wrote:
> >
> > Hi ,
> >   I am curious “How long does a  query q=field1:2312 cost ,   which
> > exactly match only one document? ”,  Of course we just discuss  no
> > queryResultCache with match in this situation.
> >   In fact  my QTime is  150ms+, it is too long.
>


Re: Replacing Double Quotes from a field

2018-09-03 Thread Shawn Heisey

On 9/3/2018 1:51 AM, Gopesh Sharma wrote:

I am trying to remove the double quotes from a field and that's why written 
PatternReplaceCharFilterFactory, but it doesn't seem to be working.


When you say it's not working, how precisely are you checking?  If 
you're looking at the field value in search results, that will NEVER 
change because of analysis.  No matter what your analysis does, search 
results will not be affected by it.






Using "$1" here isn't valid.  Your pattern doesn't include any 
parentheses, which is how regex marks the groups that can then be 
accessed with $1, $2, etc.  Try using "" for the replacement instead -- 
replace a quote with the empty string.



I also tried to replace it on querying time, but the SOLR throwing error that 
entity must be closed with >


There's not enough information here to troubleshoot.


select replace(t.name, '\"', '') as NAME from wp_31_term_taxonomy


This is SQL syntax, which almost always doesn't apply to Solr.  What 
were you trying to say by including this?


Thanks,
Shawn



Re: MLT in Cloud Mode - Not Returning Fields?

2018-09-03 Thread Doug Turnbull
Thanks Charlie, those are helpful.

I think at this point we will attach a debugger and see what shakes out.
Perhaps it's one of these cases you list. Perhaps we're missing something.
We'll report back.

-Doug

On Mon, Sep 3, 2018 at 5:23 AM Charlie Hull  wrote:

> On 31/08/2018 19:36, Doug Turnbull wrote:
> > Hello,
> >
> > We're working on a Solr More Like This project (Solr 6.6.2), using the
> More
> > Like This searchComponent. What we note is in standalone Solr, when we
> > request MLT using the search component, we get every more like this
> > document fully formed with complete fields in the moreLikeThis section.
>
> Hey Doug,
>
> IIRC there wasn't a lot of support for MLT in cloud mode a few years
> ago, and there are certainly still a few open issues around cloud support:
> https://issues.apache.org/jira/browse/SOLR-4414
> https://issues.apache.org/jira/browse/SOLR-5480
> Maybe there are some hints in the ticket comments about different ways
> to do what you want.
>
> Cheers
>
> Charlie
>
> >
> > In cloud, however, with the exact same query and config, we only get the
> > doc ids under "moreLikeThis" requiring us to fetch the metadata
> associated
> > with each document.
> >
> > I can't easily share an example due to confidentiality, but I want to
> check
> > if we're missing something? Documentation doesn't mention any
> limitations.
> > The only interesting note I've found is this one which points to a
> > potential difference in behavior
> >
> >>   The Cloud MLT Query Parser uses the realtime get handler to retrieve
> the
> > fields to be mined for keywords. Because of the way the realtime get
> > handler is implemented, it does not return data for fields populated
> using
> > copyField.
> >
> > https://stackoverflow.com/a/46307140/8123
> >
> > Any thoughts?
> >
> > -Doug
> >
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334 <+44%20870%20011%208334>
> mobile:  +44 (0)7767 825828 <+44%207767%20825828>
> web: www.flax.co.uk
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Contextual Synonym Filter

2018-09-03 Thread Vergantini Luca
I need to create a contextual Synonym Filter:

I need that the Synonym Filter load different synonym configuration based on 
the fq query parameter.

I've already modified the SynonymGraphFilterFactory to load from DB (this is 
another requirement) but I can't understand how to make the fq parameter arrive 
to the Factory.

Maybe I need a Query Parser plugin?

Please help


Luca Vergantini

Whitehall Reply
Via del Giorgione, 59
00147 - Roma - ITALY
phone: +39 06 844341
l.vergant...@reply.it
www.reply.it

[Whitehall Reply]


Re: Contextual Synonym Filter

2018-09-03 Thread Andrea Gazzarini

Hi Luca,
I believe this is not an easy task to do passing through Solr/Lucene 
internals; did you try to use what Solr offers out of the box?
For example, you could define several fields associated where each 
corresponding field type uses a different synonym set. So you would have


 * F1 -> FT1 -> SYNSET1
 * F2 -> FT2 -> SYNSET2
 * ...

if you query using *fq=F1:something* then the *FT1* (and the *SYNSET1*) 
will be used, if you query using *fq=F2:something* then the *FT2* (and 
the *SYNSET2*) will be used, and so on


I don't know your context so my suggestion could be absolutely in the 
wrong path


Best,
Andrea

On 03/09/2018 15:41, Vergantini Luca wrote:


I need to create a contextual Synonym Filter:

I need that the Synonym Filter load different synonym configuration 
based on the fq query parameter.


I’ve already modified the SynonymGraphFilterFactory to load from DB 
(this is another requirement) but I can’t understand how to make the 
fq parameter arrive to the Factory.


Maybe I need a Query Parser plugin?

Please help



Luca Vergantini

Whitehall Reply
Via del Giorgione, 59
00147 - Roma - ITALY
phone: +39 06 844341
l.vergant...@reply.it 
www.reply.it

Whitehall Reply




Feature Selection and Model Training for Solr LTR

2018-09-03 Thread Zheng Lin Edwin Yeo
Hi,

I am in the process of setting up Solr LTR in Solr 7.4.0. Understand that
there are different types of model like Linear Model, Multiple Additive
Trees Model and Neural Network Model.

Any one has information on which model is the most suitable to be use for
the best performance for dealing with large amount of data?

Regards,
Edwin


Re: How long does a query?q=field1:2312 should cost? exactly hit one document.

2018-09-03 Thread Erick Erickson
My guess is that you're searching un-warmed instances of Solr and are
seeing the time it takes to read the index structures into memory the
first time. What happens if you turn off indexing and query a number
of values (not the same one or you'll hit the queryResultCache).

So your first query would be:
"q":"v00_s:1"
your second
"q":"v00_s:2",

and so on. I'd expect to see decreasing QTimes and after the first few
a pretty steady response time.

Beyond that, what are your machine/index characteristics? How many
docs per replica? What version of Solr? How much heap allocated to the
JVM? How much RAM on the machine? you might review:'
https://wiki.apache.org/solr/UsingMailingLists

Debug adds overhead to the query response time, I suspect that's what
you're seeing although 6 seconds is surprisingly long.

Best,
Erick
On Mon, Sep 3, 2018 at 6:08 AM zhenyuan wei  wrote:
>
> Only a termQuery q=field1:2312, No other conditions.
> I try debug now, but can not find out  what is the main cost.
> Debug=timing output like :
>
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":157,
> "params":{
>   "q":"v00_s:15de21c670ae7c3f6f3f1f37029303c9",
>   "debug":"timing"}},
>   "response":{"numFound":1,"start":0,"maxScore":17.099754,"docs":[
>   {
> "v00_s":"15de21c670ae7c3f6f3f1f37029303c9",
> "v01_s":"7596295605015",
> "v02_s":"Mp9XkmrRXhFChgMAGoydOvAD",
> "v03_l":555,
> "v04_s":"55",
> "v05_s":"39994237071313698949",
> "v06_s":"3",
> "v07_s":"155",
> "v08_s":"5",
> "v09_s":"15",
> "v10_s":"15",
> "v11_s":"555",
> "v12_s":"43819292",
> "v13_s":"549754428",
> "v14_s":"8111596961",
> "id":"000555",
> "_version_":1610106981630083079}]
>   },
>   "debug":{
> "timing":{
>   "time":6336.0,
>   "prepare":{
> "time":8.0,
> "query":{
>   "time":8.0},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":0.0}},
>   "process":{
> "time":6270.0,
> "query":{
>   "time":6268.0,
>   "doProcessSearchByIds":{
> "time":0.0},
>   "doProcessUngroupedSearch":{
> "time":6265.0,
> "search":{
>   "time":6261.0,
>   "getDocListC":{
> "time":6261.0,
> "lookup_queryResultCache":{
>   "time":0.0},
> "lookupNotuseFilterCacheTimer":{
>   "time":6258.0},
> "getDocListNCTimer":{
>   "time":6258.0,
>   "getProcessedFilter":{
> "time":0.0},
>   "buildAndRunCollectorChain2":{
> "time":6255.0},
>   "topDocs":{
> "time":0.0,
> "doFieldSortValues":{
>   "time":0.0},
> "doPrefetch":{
>   "time":0.0}}},
> "facet":{
>   "time":0.0},
> "facet_module":{
>   "time":0.0},
> "mlt":{
>   "time":0.0},
> "highlight":{
>   "time":0.0},
> "stats":{
>   "time":0.0},
> "expand":{
>   "time":0.0},
> "terms":{
>   "time":0.0},
> "debug":{
>   "time":0.0}
>
>
> My request is :  curl "
> http://localhost:8983/solr/collection001/query?q=v00_s
> :
> 15de21c670ae7c3f6f3f1f37029303c9&debug=timing"
>
> I  also hope to using debug=true to find out more things,so I added some
> sub timer to trace which sub method is slowly.
> And found , as above,  the "SolrIndexSearch.buildAndRunCollectorChain() "
> cost a lot。
> ( if I want to  find out the answer, I think I have not idea but  debug
> tracing  to deeper into lucene level method.)
>
> *At this moment, I have another question too, why  debug time is 6336.0,
> which less than QTime=157 ?*
>
> Erik Hatcher  于2018年9月3日周一 下午8:30写道:
>
> > Add debug=true and see where the time goes, in which components?
> >
> > Highlighting is my culprit guess.   Or faceting?
> >
> > > On Sep 3, 2018, at 07:45, zhenyuan wei  wrote:
> > >
> > > Hi ,
> > >   I am curious “How long does a  query q=field1:2312 cost ,   which
> > > exactly match only one document? ”,  Of course we just discuss  no
> > > queryResultCache with match in this situation.
> > >   In fact  my QTime is  150ms+, it is too long.
> >


Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Björn Häuser
Hello,

we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 4 
replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are running 
Zookeeper 4.1.13.

Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
exhaustion. After obtaining a heap dump it looks like that we have a lot of 
IndexSearchers open for our largest collection.

The dump contains around ~60 IndexSearchers, and each containing around ~40mb 
heap. Another 500MB of heap is the fieldcache, which is expected in my opinion.

The current config can be found here: 
https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 


Analyzing the heap dump eclipse MAT says this:

Problem Suspect 1

91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 1.981.148.336 
(38,26%) bytes. 

Biggest instances:

• org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 70.087.272 
(1,35%) bytes. 
• org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 65.678.264 
(1,27%) bytes. 
• org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 63.050.600 
(1,22%) bytes. 


Problem Suspect 2

223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 1.373.110.208 
(26,52%) bytes. 


Any help is appreciated. Thank you very much!
Björn

Re: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Erick Erickson
I would expect at least 1 IndexSearcher per replica, how many total
replicas hosted in your JVM?

Plus, if you're actively indexing, there may temporarily be 2
IndexSearchers open while the new searcher warms.

And there may be quite a few caches, at least queryResultCache and
filterCache and documentCache, one of each per replica and maybe two
(for queryResultCache and filterCache) if you have a background
searcher autowarming.

At a glance, your autowarm counts are very high, so it may take some
time to autowarm leading to multiple IndexSearchers and caches open
per replica when you happen to hit a commit point. I usually start
with 16-20 as an autowarm count, the benefit decreases rapidly as you
increase the count.

I'm not quite sure why it would be different in 7x .vs. 6x. How much
heap do you allocate to the JVM? And do you see similar heap dumps in
6.6?

Best,
Erick
On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser  wrote:
>
> Hello,
>
> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 4 
> replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are running 
> Zookeeper 4.1.13.
>
> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
> exhaustion. After obtaining a heap dump it looks like that we have a lot of 
> IndexSearchers open for our largest collection.
>
> The dump contains around ~60 IndexSearchers, and each containing around ~40mb 
> heap. Another 500MB of heap is the fieldcache, which is expected in my 
> opinion.
>
> The current config can be found here: 
> https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
> 
>
> Analyzing the heap dump eclipse MAT says this:
>
> Problem Suspect 1
>
> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> 1.981.148.336 (38,26%) bytes.
>
> Biggest instances:
>
> • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 70.087.272 
> (1,35%) bytes.
> • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 65.678.264 
> (1,27%) bytes.
> • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 63.050.600 
> (1,22%) bytes.
>
>
> Problem Suspect 2
>
> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> 1.373.110.208 (26,52%) bytes.
>
>
> Any help is appreciated. Thank you very much!
> Björn


Re: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Björn Häuser
Hi Erick,

thank you for your answer.

Unfortunately I do not have a heap dump from 6.6.


> On 3. Sep 2018, at 20:48, Erick Erickson  wrote:
> 
> I would expect at least 1 IndexSearcher per replica, how many total
> replicas hosted in your JVM?

27 replicas per JVM.

> 
> Plus, if you're actively indexing, there may temporarily be 2
> IndexSearchers open while the new searcher warms.
> 
> And there may be quite a few caches, at least queryResultCache and
> filterCache and documentCache, one of each per replica and maybe two
> (for queryResultCache and filterCache) if you have a background
> searcher autowarming.
> 
> At a glance, your autowarm counts are very high, so it may take some
> time to autowarm leading to multiple IndexSearchers and caches open
> per replica when you happen to hit a commit point. I usually start
> with 16-20 as an autowarm count, the benefit decreases rapidly as you
> increase the count.

As a counter measure I reduced the autowarm counts now per API calls to 10. Let 
me see if the system is now more stable. Tomorrow morning I will create a new 
heap dump, to see if the there are searchers.

Is there any metrics which could tell me that without a heap dump?

> 
> I'm not quite sure why it would be different in 7x .vs. 6x. How much
> heap do you allocate to the JVM? And do you see similar heap dumps in
> 6.6?
> 
> Best,
> Erick

Thanks Erick!

 Björn


> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser  wrote:
>> 
>> Hello,
>> 
>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 4 
>> replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are 
>> running Zookeeper 4.1.13.
>> 
>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
>> exhaustion. After obtaining a heap dump it looks like that we have a lot of 
>> IndexSearchers open for our largest collection.
>> 
>> The dump contains around ~60 IndexSearchers, and each containing around 
>> ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in my 
>> opinion.
>> 
>> The current config can be found here: 
>> https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
>> 
>> 
>> Analyzing the heap dump eclipse MAT says this:
>> 
>> Problem Suspect 1
>> 
>> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
>> 1.981.148.336 (38,26%) bytes.
>> 
>> Biggest instances:
>> 
>>• org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 70.087.272 
>> (1,35%) bytes.
>>• org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 65.678.264 
>> (1,27%) bytes.
>>• org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 63.050.600 
>> (1,22%) bytes.
>> 
>> 
>> Problem Suspect 2
>> 
>> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
>> 1.373.110.208 (26,52%) bytes.
>> 
>> 
>> Any help is appreciated. Thank you very much!
>> Björn



RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Markus Jelsma
Hello,

Getting an OOM plus the fact you are having a lot of IndexSearcher instances 
rings a familiar bell. One of our collections has the same issue [1] when we 
attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom Solr 
code but had to keep our Lucene filters in the schema, the problem persisted.

The odd thing, however, is that you appear to have the same problem, but not 
with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm 
the problem is not also in 7.3.0? 

You should see the instance count for IndexSearcher increase by one for each 
replica on each commit.

Regards,
Markus

[1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html 

 
 
-Original message-
> From:Erick Erickson 
> Sent: Monday 3rd September 2018 20:49
> To: solr-user 
> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> 
> I would expect at least 1 IndexSearcher per replica, how many total
> replicas hosted in your JVM?
> 
> Plus, if you're actively indexing, there may temporarily be 2
> IndexSearchers open while the new searcher warms.
> 
> And there may be quite a few caches, at least queryResultCache and
> filterCache and documentCache, one of each per replica and maybe two
> (for queryResultCache and filterCache) if you have a background
> searcher autowarming.
> 
> At a glance, your autowarm counts are very high, so it may take some
> time to autowarm leading to multiple IndexSearchers and caches open
> per replica when you happen to hit a commit point. I usually start
> with 16-20 as an autowarm count, the benefit decreases rapidly as you
> increase the count.
> 
> I'm not quite sure why it would be different in 7x .vs. 6x. How much
> heap do you allocate to the JVM? And do you see similar heap dumps in
> 6.6?
> 
> Best,
> Erick
> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser  wrote:
> >
> > Hello,
> >
> > we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 
> > 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are 
> > running Zookeeper 4.1.13.
> >
> > Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
> > exhaustion. After obtaining a heap dump it looks like that we have a lot of 
> > IndexSearchers open for our largest collection.
> >
> > The dump contains around ~60 IndexSearchers, and each containing around 
> > ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in 
> > my opinion.
> >
> > The current config can be found here: 
> > https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
> > 
> >
> > Analyzing the heap dump eclipse MAT says this:
> >
> > Problem Suspect 1
> >
> > 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
> > "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> > 1.981.148.336 (38,26%) bytes.
> >
> > Biggest instances:
> >
> > • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 
> > 70.087.272 (1,35%) bytes.
> > • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 
> > 65.678.264 (1,27%) bytes.
> > • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 
> > 63.050.600 (1,22%) bytes.
> >
> >
> > Problem Suspect 2
> >
> > 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
> > "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> > 1.373.110.208 (26,52%) bytes.
> >
> >
> > Any help is appreciated. Thank you very much!
> > Björn
> 


Re: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Erick Erickson
Reducing to 10 won't be definitive, but if the problem gets better
it'll be a clue.

How are you committing? Is it just based on the solrconfig settings or
do you have any clients submitting commit commands?

One fat clue would be if, in your solr logs, you were getting any
warnings about "too many on deck searchers" (going from memory here,
exact wording may differ). That's an indication that your autowarm
times are taking longer than 20 seconds (your soft commit interval),
which would point to excessive autowarming being _part_ of the
problem. This assumes you're indexing steadily.

Still, though, changing from 6.6 to 7x shouldn't be that much different.

It's possible that you were running close to your heap limit with 6.6
and a relatively small difference in heap usage with 7x threw you over
the tipping point, but that's just hand-waving on my part.

And I'm guessing this is a prod system so experiments aren't tolerable...

What you can measure. Starting with 6.4 there are about a zillion metrics,
try: http://host:port/solr/admin/metrics for the complete list and
pick and choose.

Note that there are ways to cut down on how much is reported, I
suspect you'll be interested first in:
http://localhost:8983/solr/admin/metrics?prefix=SEARCHER

https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html

These tend to be on a per-core (replica) basis so you may have to do
some aggregating.

Good luck!
Erick
On Mon, Sep 3, 2018 at 12:54 PM Markus Jelsma
 wrote:
>
> Hello,
>
> Getting an OOM plus the fact you are having a lot of IndexSearcher instances 
> rings a familiar bell. One of our collections has the same issue [1] when we 
> attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom Solr 
> code but had to keep our Lucene filters in the schema, the problem persisted.
>
> The odd thing, however, is that you appear to have the same problem, but not 
> with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm 
> the problem is not also in 7.3.0?
>
> You should see the instance count for IndexSearcher increase by one for each 
> replica on each commit.
>
> Regards,
> Markus
>
> [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html
>
>
>
> -Original message-
> > From:Erick Erickson 
> > Sent: Monday 3rd September 2018 20:49
> > To: solr-user 
> > Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> >
> > I would expect at least 1 IndexSearcher per replica, how many total
> > replicas hosted in your JVM?
> >
> > Plus, if you're actively indexing, there may temporarily be 2
> > IndexSearchers open while the new searcher warms.
> >
> > And there may be quite a few caches, at least queryResultCache and
> > filterCache and documentCache, one of each per replica and maybe two
> > (for queryResultCache and filterCache) if you have a background
> > searcher autowarming.
> >
> > At a glance, your autowarm counts are very high, so it may take some
> > time to autowarm leading to multiple IndexSearchers and caches open
> > per replica when you happen to hit a commit point. I usually start
> > with 16-20 as an autowarm count, the benefit decreases rapidly as you
> > increase the count.
> >
> > I'm not quite sure why it would be different in 7x .vs. 6x. How much
> > heap do you allocate to the JVM? And do you see similar heap dumps in
> > 6.6?
> >
> > Best,
> > Erick
> > On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser  
> > wrote:
> > >
> > > Hello,
> > >
> > > we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard 
> > > each, 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We 
> > > are running Zookeeper 4.1.13.
> > >
> > > Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
> > > exhaustion. After obtaining a heap dump it looks like that we have a lot 
> > > of IndexSearchers open for our largest collection.
> > >
> > > The dump contains around ~60 IndexSearchers, and each containing around 
> > > ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in 
> > > my opinion.
> > >
> > > The current config can be found here: 
> > > https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
> > > 
> > >
> > > Analyzing the heap dump eclipse MAT says this:
> > >
> > > Problem Suspect 1
> > >
> > > 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
> > > "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> > > 1.981.148.336 (38,26%) bytes.
> > >
> > > Biggest instances:
> > >
> > > • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 
> > > 70.087.272 (1,35%) bytes.
> > > • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 
> > > 65.678.264 (1,27%) bytes.
> > > • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 
> > > 63.050.600 (1,22%) bytes.
> > >
> > >
> > > Problem Suspect 2
> > >
> > > 223 instances of "org.apache.solr.

Re: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Björn Häuser
Hi Markus,

this reads exactly like what we have. Where you able to figure out anything? 
Currently thinking about rollbacking to 7.2.1. 



> On 3. Sep 2018, at 21:54, Markus Jelsma  wrote:
> 
> Hello,
> 
> Getting an OOM plus the fact you are having a lot of IndexSearcher instances 
> rings a familiar bell. One of our collections has the same issue [1] when we 
> attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom Solr 
> code but had to keep our Lucene filters in the schema, the problem persisted.
> 
> The odd thing, however, is that you appear to have the same problem, but not 
> with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm 
> the problem is not also in 7.3.0? 
> 

We had very similar problems with 7.3.0 but never analyzed them and just 
updated to 7.4.0 because I thought thats the bug we hit: 
https://issues.apache.org/jira/browse/SOLR-11882 



> You should see the instance count for IndexSearcher increase by one for each 
> replica on each commit.


Sorry, where can I find this? ;) Sorry, did not find anything. 

Thanks
Björn

> 
> Regards,
> Markus
> 
> [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html 
> 
> 
> 
> -Original message-
>> From:Erick Erickson 
>> Sent: Monday 3rd September 2018 20:49
>> To: solr-user 
>> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
>> 
>> I would expect at least 1 IndexSearcher per replica, how many total
>> replicas hosted in your JVM?
>> 
>> Plus, if you're actively indexing, there may temporarily be 2
>> IndexSearchers open while the new searcher warms.
>> 
>> And there may be quite a few caches, at least queryResultCache and
>> filterCache and documentCache, one of each per replica and maybe two
>> (for queryResultCache and filterCache) if you have a background
>> searcher autowarming.
>> 
>> At a glance, your autowarm counts are very high, so it may take some
>> time to autowarm leading to multiple IndexSearchers and caches open
>> per replica when you happen to hit a commit point. I usually start
>> with 16-20 as an autowarm count, the benefit decreases rapidly as you
>> increase the count.
>> 
>> I'm not quite sure why it would be different in 7x .vs. 6x. How much
>> heap do you allocate to the JVM? And do you see similar heap dumps in
>> 6.6?
>> 
>> Best,
>> Erick
>> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser  wrote:
>>> 
>>> Hello,
>>> 
>>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 
>>> 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are 
>>> running Zookeeper 4.1.13.
>>> 
>>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
>>> exhaustion. After obtaining a heap dump it looks like that we have a lot of 
>>> IndexSearchers open for our largest collection.
>>> 
>>> The dump contains around ~60 IndexSearchers, and each containing around 
>>> ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in 
>>> my opinion.
>>> 
>>> The current config can be found here: 
>>> https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
>>> 
>>> 
>>> Analyzing the heap dump eclipse MAT says this:
>>> 
>>> Problem Suspect 1
>>> 
>>> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
>>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
>>> 1.981.148.336 (38,26%) bytes.
>>> 
>>> Biggest instances:
>>> 
>>>• org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 
>>> 70.087.272 (1,35%) bytes.
>>>• org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 
>>> 65.678.264 (1,27%) bytes.
>>>• org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 
>>> 63.050.600 (1,22%) bytes.
>>> 
>>> 
>>> Problem Suspect 2
>>> 
>>> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
>>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
>>> 1.373.110.208 (26,52%) bytes.
>>> 
>>> 
>>> Any help is appreciated. Thank you very much!
>>> Björn
>> 



Re: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Björn Häuser
Hi,


> On 3. Sep 2018, at 22:18, Erick Erickson  wrote:
> 
> Reducing to 10 won't be definitive, but if the problem gets better
> it'll be a clue.
> 
> How are you committing? Is it just based on the solrconfig settings or
> do you have any clients submitting commit commands?

Only through the auto commits, no manual committing from the application.

> 
> One fat clue would be if, in your solr logs, you were getting any
> warnings about "too many on deck searchers" (going from memory here,
> exact wording may differ). That's an indication that your autowarm
> times are taking longer than 20 seconds (your soft commit interval),
> which would point to excessive autowarming being _part_ of the
> problem. This assumes you're indexing steadily.

I searched our logs and could not find any evidence for this. I searched for:

- searchers
- auto
- warmup

There was nothing about too many searchers. Which would mean they are actually 
leaking and not too many warming up right?

> 
> Still, though, changing from 6.6 to 7x shouldn't be that much different.
> 
> It's possible that you were running close to your heap limit with 6.6
> and a relatively small difference in heap usage with 7x threw you over
> the tipping point, but that's just hand-waving on my part.
> 

I really thought about this, but in our 6.6. times we had a lot of head from in 
the young generation and also very log gc timings.


> And I'm guessing this is a prod system so experiments aren't tolerable…

What do you have in mind? Increasing memory? Thats something we anyway have 
todo - if it helps.
Our current setup is not very stable anyway, so we have some room for 
experiments.

> 
> What you can measure. Starting with 6.4 there are about a zillion metrics,
> try: http://host:port/solr/admin/metrics for the complete list and
> pick and choose.
> 
> Note that there are ways to cut down on how much is reported, I
> suspect you'll be interested first in:
> http://localhost:8983/solr/admin/metrics?prefix=SEARCHER
> 
> https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html
> 

Funny thing is that we tried to use the prometheus exporter for these metrics, 
but whenever we started it it killed our solr node immediately. 

I will try to look into these metrics, but looking at them yields no valuable 
results for me. All metrics are “fine”. 

Is there anything special you would take a look at?

> These tend to be on a per-core (replica) basis so you may have to do
> some aggregating.
> 
> Good luck!


Thank you very much :)
Björn

> Erick
> On Mon, Sep 3, 2018 at 12:54 PM Markus Jelsma
>  wrote:
>> 
>> Hello,
>> 
>> Getting an OOM plus the fact you are having a lot of IndexSearcher instances 
>> rings a familiar bell. One of our collections has the same issue [1] when we 
>> attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom 
>> Solr code but had to keep our Lucene filters in the schema, the problem 
>> persisted.
>> 
>> The odd thing, however, is that you appear to have the same problem, but not 
>> with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you confirm 
>> the problem is not also in 7.3.0?
>> 
>> You should see the instance count for IndexSearcher increase by one for each 
>> replica on each commit.
>> 
>> Regards,
>> Markus
>> 
>> [1] http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html
>> 
>> 
>> 
>> -Original message-
>>> From:Erick Erickson 
>>> Sent: Monday 3rd September 2018 20:49
>>> To: solr-user 
>>> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
>>> 
>>> I would expect at least 1 IndexSearcher per replica, how many total
>>> replicas hosted in your JVM?
>>> 
>>> Plus, if you're actively indexing, there may temporarily be 2
>>> IndexSearchers open while the new searcher warms.
>>> 
>>> And there may be quite a few caches, at least queryResultCache and
>>> filterCache and documentCache, one of each per replica and maybe two
>>> (for queryResultCache and filterCache) if you have a background
>>> searcher autowarming.
>>> 
>>> At a glance, your autowarm counts are very high, so it may take some
>>> time to autowarm leading to multiple IndexSearchers and caches open
>>> per replica when you happen to hit a commit point. I usually start
>>> with 16-20 as an autowarm count, the benefit decreases rapidly as you
>>> increase the count.
>>> 
>>> I'm not quite sure why it would be different in 7x .vs. 6x. How much
>>> heap do you allocate to the JVM? And do you see similar heap dumps in
>>> 6.6?
>>> 
>>> Best,
>>> Erick
>>> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser  
>>> wrote:
 
 Hello,
 
 we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard each, 
 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We are 
 running Zookeeper 4.1.13.
 
 Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
 exhaustion. After obtaining a heap dump it looks like that we have a lot 
 of IndexSear

RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Markus Jelsma
Hello Björn,

Take great care, 7.2.1 cannot read an index written by 7.4.0, so you cannot 
roll back but need to reindex! 

Andrey Kudryavtsev made a good suggestion in the thread on how to find the 
culprit, but it will be a tedious task. I have not yet had the time or courage 
to venture there.

Hope it helps,
Markus

 
 
-Original message-
> From:Björn Häuser 
> Sent: Monday 3rd September 2018 22:28
> To: solr-user@lucene.apache.org
> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> 
> Hi Markus,
> 
> this reads exactly like what we have. Where you able to figure out anything? 
> Currently thinking about rollbacking to 7.2.1. 
> 
> 
> 
> > On 3. Sep 2018, at 21:54, Markus Jelsma  wrote:
> > 
> > Hello,
> > 
> > Getting an OOM plus the fact you are having a lot of IndexSearcher 
> > instances rings a familiar bell. One of our collections has the same issue 
> > [1] when we attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all 
> > our custom Solr code but had to keep our Lucene filters in the schema, the 
> > problem persisted.
> > 
> > The odd thing, however, is that you appear to have the same problem, but 
> > not with 7.3.0? Since you shortly after 7.3.0 upgraded to 7.4.0, can you 
> > confirm the problem is not also in 7.3.0? 
> > 
> 
> We had very similar problems with 7.3.0 but never analyzed them and just 
> updated to 7.4.0 because I thought thats the bug we hit: 
> https://issues.apache.org/jira/browse/SOLR-11882 
> 
> 
> 
> > You should see the instance count for IndexSearcher increase by one for 
> > each replica on each commit.
> 
> 
> Sorry, where can I find this? ;) Sorry, did not find anything. 
> 
> Thanks
> Björn
> 
> > 
> > Regards,
> > Markus
> > 
> > [1] 
> > http://lucene.472066.n3.nabble.com/RE-7-3-appears-to-leak-td4396232.html 
> > 
> > 
> > 
> > -Original message-
> >> From:Erick Erickson 
> >> Sent: Monday 3rd September 2018 20:49
> >> To: solr-user 
> >> Subject: Re: Heap Memory Problem after Upgrading to 7.4.0
> >> 
> >> I would expect at least 1 IndexSearcher per replica, how many total
> >> replicas hosted in your JVM?
> >> 
> >> Plus, if you're actively indexing, there may temporarily be 2
> >> IndexSearchers open while the new searcher warms.
> >> 
> >> And there may be quite a few caches, at least queryResultCache and
> >> filterCache and documentCache, one of each per replica and maybe two
> >> (for queryResultCache and filterCache) if you have a background
> >> searcher autowarming.
> >> 
> >> At a glance, your autowarm counts are very high, so it may take some
> >> time to autowarm leading to multiple IndexSearchers and caches open
> >> per replica when you happen to hit a commit point. I usually start
> >> with 16-20 as an autowarm count, the benefit decreases rapidly as you
> >> increase the count.
> >> 
> >> I'm not quite sure why it would be different in 7x .vs. 6x. How much
> >> heap do you allocate to the JVM? And do you see similar heap dumps in
> >> 6.6?
> >> 
> >> Best,
> >> Erick
> >> On Mon, Sep 3, 2018 at 10:33 AM Björn Häuser  
> >> wrote:
> >>> 
> >>> Hello,
> >>> 
> >>> we recently upgraded our solrcloud (5 nodes, 25 collections, 1 shard 
> >>> each, 4 replicas each) from 6.6.0 to 7.3.0 and shortly after to 7.4.0. We 
> >>> are running Zookeeper 4.1.13.
> >>> 
> >>> Since the upgrade to 7.3.0 and also 7.4.0 we encountering heap space 
> >>> exhaustion. After obtaining a heap dump it looks like that we have a lot 
> >>> of IndexSearchers open for our largest collection.
> >>> 
> >>> The dump contains around ~60 IndexSearchers, and each containing around 
> >>> ~40mb heap. Another 500MB of heap is the fieldcache, which is expected in 
> >>> my opinion.
> >>> 
> >>> The current config can be found here: 
> >>> https://gist.github.com/bjoernhaeuser/327a65291ac9793e744b87f0a561e844 
> >>> 
> >>> 
> >>> Analyzing the heap dump eclipse MAT says this:
> >>> 
> >>> Problem Suspect 1
> >>> 
> >>> 91 instances of "org.apache.solr.search.SolrIndexSearcher", loaded by 
> >>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> >>> 1.981.148.336 (38,26%) bytes.
> >>> 
> >>> Biggest instances:
> >>> 
> >>>    • org.apache.solr.search.SolrIndexSearcher @ 0x6ffd47ea8 - 
> >>>70.087.272 (1,35%) bytes.
> >>>    • org.apache.solr.search.SolrIndexSearcher @ 0x79ea9c040 - 
> >>>65.678.264 (1,27%) bytes.
> >>>    • org.apache.solr.search.SolrIndexSearcher @ 0x6855ad680 - 
> >>>63.050.600 (1,22%) bytes.
> >>> 
> >>> 
> >>> Problem Suspect 2
> >>> 
> >>> 223 instances of "org.apache.solr.util.ConcurrentLRUCache", loaded by 
> >>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6807d1048" occupy 
> >>> 1.373.110.208 (26,52%) bytes.
> >>> 
> >>> 
> >>> Any help is appreciated. Thank you very much!
> >>> Björn
> >> 
> 
> 


Re: BUMP: Atomic updates and POST command?

2018-09-03 Thread Scott Prentice

Thanks, Shawn. That helps with the meaning of the "solr" format.

Our needs are pretty basic. We have some upstream processes that crawl 
the data and generate a JSON feed that works with the default post 
command. So far this works well and keeps things simple.


Thanks!
...scott


On 9/1/18 9:26 PM, Shawn Heisey wrote:

On 8/31/2018 7:18 PM, Scott Prentice wrote:

Yup. That does the trick! Here's my command line ..

    $ ./bin/post -c core01 -format solr /home/xtech/solrtest/test1b.json

I saw that "-format solr" option, but it wasn't clear what it did. 
It's still not clear to me how that changes the endpoint to allow for 
updates. But nice to see that it works! 


I think the assumption with JSON-style updates and the post tool is 
that you are sending "generic" json documents, not Solr-formatted json 
commands.  So the post tool sends to the /update/json/docs handler, 
which can handle those easily.  I believe that telling it that the 
format is "solr" means that the JSON input is instructions to Solr, 
not just document content.  It very likely sends it to /update/json or 
/update when that's the case.


I don't know if you know this, but the bin/post command calls 
something in Solr that is named SimplePostTool.  It is, as that name 
suggests, a very simple tool.  Although you CAN use it in production, 
a large percentage of users find that they outgrow its capabilities 
and must move to writing their own indexing system.


Thanks,
Shawn






solr how to show different documents for different users in a better way

2018-09-03 Thread ??????
Hi ,


Currently I have integrated solr into my project, but I meet with some 
problems. 
our project is a archives management system and different user have access to 
different documents, therefor, we have two way to filter the permissions. 
1. we can get all the documents from solr which match the search conditions, 
then do the paging by coding, while solr is paging default as 10 docs per 
request, then we have to search for twice(first time get the numfound, second 
times get all the docs.), by this way, the time consumed will be large and will 
increase geometricly when more documents are add in.
2. we can use the default paging provided by solr and also filter the 
permissions by adding all the reachable documents into the searching request. 
By this way, another problems arose and that is what if the numer of documents 
one user can reach increased to be larger than the max Boolean Clauses, then 
the error will occur in solr. And I think in this way, the cache of solr will 
not work fine


So about this problem, I want to ask if there is a better way?


Regards
guanqunyin