eDismax parser and the mm parameter

2014-03-29 Thread S.L
Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the user
could be of any length (i.e. >=1) I would like to set the mm value to 1 . I
have the following questions regarding this parameter.

   1. Is it set to 1 by default ?
   2. In my schema.xml the defaultOperator is set to "AND" should I set it
   to "OR" inorder for the edismax parser to be effective with a mm of 1?


Thanks in advance!


Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Thanks Jack! I understand the intent of mm parameter, my question is that
since the query terms being provided are not of fixed length I do not know
what the mm should like for example "Ginseng","Siberian Ginseng" are my
search terms. The first one can have an mm upto 1 and the second one can
have an mm of upto 2 .

Should I dynamically set the mm based on the number of search terms in my
query ?

Thanks again.


On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky wrote:

> 1. Yes, the default for mm is 1.
>
> 2. It depends on what you are really trying to do - you haven't told us.
>
> Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
> q.op=AND.
>
> Generally, use q.op unless you really know what you are doing.
>
> Generally, the intent of mm is to set the minimum number of OR/SHOULD
> clauses that must match on the top level of a query.
>
> -- Jack Krupansky
>
> -Original Message- From: S.L
> Sent: Sunday, March 30, 2014 2:25 AM
> To: solr-user@lucene.apache.org
> Subject: eDismax parser and the mm parameter
>
> Hi All,
>
> I am planning to use the eDismax query parser in SOLR to give boost to
> documents that have a phrase in their fields present. Now there is a mm
> parameter in the edismax parser query , since the query typed by the user
> could be of any length (i.e. >=1) I would like to set the mm value to 1 . I
> have the following questions regarding this parameter.
>
>   1. Is it set to 1 by default ?
>   2. In my schema.xml the defaultOperator is set to "AND" should I set it
>   to "OR" inorder for the edismax parser to be effective with a mm of 1?
>
>
> Thanks in advance!
>


Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Jacks Thanks Again,

I am searching  Chinese medicine  documents , as the example I gave earlier
a user can search for "Ginseng" or Siberian Ginseng or Red Siberian Ginseng
, I certainly want to use pf parameter (which is not driven by mm
parameter) , however for giving higher score to documents that have more of
the terms I want to use edismax now if I give a mm of 3 and the search term
is of only length 1 (like "Ginseng") what does edisMax do ?


On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky wrote:

> It still depends on your objective - which you haven't told us yet. Show
> us some use cases and detail what your expectations are for each use case.
>
> The edismax phrase boosting is probably a lot more useful than messing
> around with mm. Take a look at pf, pf2, and pf3.
>
> See:
> http://wiki.apache.org/solr/ExtendedDisMax
> https://cwiki.apache.org/confluence/display/solr/The+
> Extended+DisMax+Query+Parser
>
> The focus on mm may indeed be a classic "XY Problem" - a premature focus
> on a solution without detailing the problem.
>
> -- Jack Krupansky
>
> -Original Message- From: S.L
> Sent: Sunday, March 30, 2014 11:18 AM
> To: solr-user@lucene.apache.org
> Subject: Re: eDismax parser and the mm parameter
>
> Thanks Jack! I understand the intent of mm parameter, my question is that
> since the query terms being provided are not of fixed length I do not know
> what the mm should like for example "Ginseng","Siberian Ginseng" are my
> search terms. The first one can have an mm upto 1 and the second one can
> have an mm of upto 2 .
>
> Should I dynamically set the mm based on the number of search terms in my
> query ?
>
> Thanks again.
>
>
> On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky 
> wrote:
>
>  1. Yes, the default for mm is 1.
>>
>> 2. It depends on what you are really trying to do - you haven't told us.
>>
>> Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
>> q.op=AND.
>>
>> Generally, use q.op unless you really know what you are doing.
>>
>> Generally, the intent of mm is to set the minimum number of OR/SHOULD
>> clauses that must match on the top level of a query.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: S.L
>> Sent: Sunday, March 30, 2014 2:25 AM
>> To: solr-user@lucene.apache.org
>> Subject: eDismax parser and the mm parameter
>>
>> Hi All,
>>
>> I am planning to use the eDismax query parser in SOLR to give boost to
>> documents that have a phrase in their fields present. Now there is a mm
>> parameter in the edismax parser query , since the query typed by the user
>> could be of any length (i.e. >=1) I would like to set the mm value to 1 .
>> I
>> have the following questions regarding this parameter.
>>
>>   1. Is it set to 1 by default ?
>>   2. In my schema.xml the defaultOperator is set to "AND" should I set it
>>   to "OR" inorder for the edismax parser to be effective with a mm of 1?
>>
>>
>> Thanks in advance!
>>
>>
>


Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Jack,

 I mis-stated the problem , I am not using the OR operator as default
now(now that I think about it it does not make sense to use the default
operator OR along with the mm parameter) , the reason I want to use pf and
mm in conjunction is because of my understanding of the edismax parser and
I have not looked into pf2 and pf3 parameters yet.

I will state my understanding here below.

Pf -  Is used to boost the result score if the complete phrase matches.
mm <(less than) search term length would help limit the query results  to a
certain number of better matches.

With that being said would it make sense to have dynamic mm (set to the
length of search term - 1)?

I also have a question around using a fuzzy search along with eDismax
parser , but I will ask that in a seperate post once I go thru that aspect
of eDismax parser.

Thanks again !





On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky wrote:

> If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
> will be dwarfed.
>
> The general goal is to assure that the top documents really are the best,
> not to necessarily limit the total document count. Focusing on the latter
> could be a real waste of time.
>
> It's still not clear why or how you need or want to use OR as the default
> operator - you still haven't given us a use case for that.
>
> To repeat: Give us a full set of use cases before taking this XY Problem
> approach of pursuing a solution before the problem is understood.
>
> -- Jack Krupansky
>
> -Original Message- From: S.L
> Sent: Sunday, March 30, 2014 6:14 PM
> To: solr-user@lucene.apache.org
> Subject: Re: eDismax parser and the mm parameter
>
> Jacks Thanks Again,
>
> I am searching  Chinese medicine  documents , as the example I gave earlier
> a user can search for "Ginseng" or Siberian Ginseng or Red Siberian Ginseng
> , I certainly want to use pf parameter (which is not driven by mm
> parameter) , however for giving higher score to documents that have more of
> the terms I want to use edismax now if I give a mm of 3 and the search term
> is of only length 1 (like "Ginseng") what does edisMax do ?
>
>
> On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky 
> wrote:
>
>  It still depends on your objective - which you haven't told us yet. Show
>> us some use cases and detail what your expectations are for each use case.
>>
>> The edismax phrase boosting is probably a lot more useful than messing
>> around with mm. Take a look at pf, pf2, and pf3.
>>
>> See:
>> http://wiki.apache.org/solr/ExtendedDisMax
>> https://cwiki.apache.org/confluence/display/solr/The+
>> Extended+DisMax+Query+Parser
>>
>> The focus on mm may indeed be a classic "XY Problem" - a premature focus
>> on a solution without detailing the problem.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: S.L
>> Sent: Sunday, March 30, 2014 11:18 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: eDismax parser and the mm parameter
>>
>> Thanks Jack! I understand the intent of mm parameter, my question is that
>> since the query terms being provided are not of fixed length I do not know
>> what the mm should like for example "Ginseng","Siberian Ginseng" are my
>> search terms. The first one can have an mm upto 1 and the second one can
>> have an mm of upto 2 .
>>
>> Should I dynamically set the mm based on the number of search terms in my
>> query ?
>>
>> Thanks again.
>>
>>
>> On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky 
>> wrote:
>>
>>  1. Yes, the default for mm is 1.
>>
>>>
>>> 2. It depends on what you are really trying to do - you haven't told us.
>>>
>>> Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
>>> q.op=AND.
>>>
>>> Generally, use q.op unless you really know what you are doing.
>>>
>>> Generally, the intent of mm is to set the minimum number of OR/SHOULD
>>> clauses that must match on the top level of a query.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: S.L
>>> Sent: Sunday, March 30, 2014 2:25 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: eDismax parser and the mm parameter
>>>
>>> Hi All,
>>>
>>> I am planning to use the eDismax query parser in SOLR to give boost to
>>> documents that have a phrase in their fields present. Now there is a mm
>>> parameter in the edismax parser query , since the query typed by the user
>>> could be of any length (i.e. >=1) I would like to set the mm value to 1 .
>>> I
>>> have the following questions regarding this parameter.
>>>
>>>   1. Is it set to 1 by default ?
>>>   2. In my schema.xml the defaultOperator is set to "AND" should I set it
>>>   to "OR" inorder for the edismax parser to be effective with a mm of 1?
>>>
>>>
>>> Thanks in advance!
>>>
>>>
>>>
>>
>


Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Thanks Jack , my use cases are as follows.


   1. Search for "Ginseng" everything related to ginseng should show up.
   2. Search For "White Siberian Ginseng" results with the whole phrase
   show up first followed by 2 words from the phrase followed by a single word
   in the phrase
   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
   documents with White Siberian Ginseng Should show up , this looks like the
   most complicated of all as Solr does not support fuzzy phrase searches . (I
   have no solution for this yet).

Thanks again!


On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky wrote:

> The mm parameter is really only relevant when the default operator is OR
> or explicit OR operators are used.
>
> Again: Please provide your use case examples and your expectations for
> each use case. It really doesn't make a lot of sense to prematurely focus
> on a solution when you haven't clearly defined your use cases.
>
> -- Jack Krupansky
>
> -Original Message- From: S.L
> Sent: Sunday, March 30, 2014 9:13 PM
> To: solr-user@lucene.apache.org
> Subject: Re: eDismax parser and the mm parameter
>
> Jack,
>
> I mis-stated the problem , I am not using the OR operator as default
> now(now that I think about it it does not make sense to use the default
> operator OR along with the mm parameter) , the reason I want to use pf and
> mm in conjunction is because of my understanding of the edismax parser and
> I have not looked into pf2 and pf3 parameters yet.
>
> I will state my understanding here below.
>
> Pf -  Is used to boost the result score if the complete phrase matches.
> mm <(less than) search term length would help limit the query results  to a
> certain number of better matches.
>
> With that being said would it make sense to have dynamic mm (set to the
> length of search term - 1)?
>
> I also have a question around using a fuzzy search along with eDismax
> parser , but I will ask that in a seperate post once I go thru that aspect
> of eDismax parser.
>
> Thanks again !
>
>
>
>
>
> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky 
> wrote:
>
>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
>> will be dwarfed.
>>
>> The general goal is to assure that the top documents really are the best,
>> not to necessarily limit the total document count. Focusing on the latter
>> could be a real waste of time.
>>
>> It's still not clear why or how you need or want to use OR as the default
>> operator - you still haven't given us a use case for that.
>>
>> To repeat: Give us a full set of use cases before taking this XY Problem
>> approach of pursuing a solution before the problem is understood.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: S.L
>> Sent: Sunday, March 30, 2014 6:14 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: eDismax parser and the mm parameter
>>
>> Jacks Thanks Again,
>>
>> I am searching  Chinese medicine  documents , as the example I gave
>> earlier
>> a user can search for "Ginseng" or Siberian Ginseng or Red Siberian
>> Ginseng
>> , I certainly want to use pf parameter (which is not driven by mm
>> parameter) , however for giving higher score to documents that have more
>> of
>> the terms I want to use edismax now if I give a mm of 3 and the search
>> term
>> is of only length 1 (like "Ginseng") what does edisMax do ?
>>
>>
>> On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky 
>> wrote:
>>
>>  It still depends on your objective - which you haven't told us yet. Show
>>
>>> us some use cases and detail what your expectations are for each use
>>> case.
>>>
>>> The edismax phrase boosting is probably a lot more useful than messing
>>> around with mm. Take a look at pf, pf2, and pf3.
>>>
>>> See:
>>> http://wiki.apache.org/solr/ExtendedDisMax
>>> https://cwiki.apache.org/confluence/display/solr/The+
>>> Extended+DisMax+Query+Parser
>>>
>>> The focus on mm may indeed be a classic "XY Problem" - a premature focus
>>> on a solution without detailing the problem.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: S.L
>>> Sent: Sunday, March 30, 2014 11:18 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: eDismax parser and the mm parameter
>>>
>>> Thanks Jack! I understand the intent of mm parameter, my question is that
>>> since the query terms being provided are 

Re: eDismax parser and the mm parameter

2014-03-31 Thread S.L
Jack ,

Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
the mm parameter from my queries, however for the fuzzy phrase queries , I
am not sure how I would be able to leverage the Complex Query Parser there
is absolutely nothing out there that gives me any idea as to how to do that
.

Why is fuzzy phrase search not provided by Solr OOB ? I am surprised

Thanks.


On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky wrote:

> The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> (the default) and ignore the mm parameter. Give pf the highest boost, and
> boost pf3 higher than pf2.
>
> You could try using the complex phrase query parser for the third case.
>
> -- Jack Krupansky
>
> -Original Message- From: S.L
> Sent: Monday, March 31, 2014 12:08 AM
> To: solr-user@lucene.apache.org
> Subject: Re: eDismax parser and the mm parameter
>
> Thanks Jack , my use cases are as follows.
>
>
>   1. Search for "Ginseng" everything related to ginseng should show up.
>   2. Search For "White Siberian Ginseng" results with the whole phrase
>   show up first followed by 2 words from the phrase followed by a single
> word
>   in the phrase
>   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
>   documents with White Siberian Ginseng Should show up , this looks like
> the
>   most complicated of all as Solr does not support fuzzy phrase searches .
> (I
>   have no solution for this yet).
>
> Thanks again!
>
>
> On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky 
> wrote:
>
>  The mm parameter is really only relevant when the default operator is OR
>> or explicit OR operators are used.
>>
>> Again: Please provide your use case examples and your expectations for
>> each use case. It really doesn't make a lot of sense to prematurely focus
>> on a solution when you haven't clearly defined your use cases.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: S.L
>> Sent: Sunday, March 30, 2014 9:13 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: eDismax parser and the mm parameter
>>
>> Jack,
>>
>> I mis-stated the problem , I am not using the OR operator as default
>> now(now that I think about it it does not make sense to use the default
>> operator OR along with the mm parameter) , the reason I want to use pf and
>> mm in conjunction is because of my understanding of the edismax parser and
>> I have not looked into pf2 and pf3 parameters yet.
>>
>> I will state my understanding here below.
>>
>> Pf -  Is used to boost the result score if the complete phrase matches.
>> mm <(less than) search term length would help limit the query results  to
>> a
>> certain number of better matches.
>>
>> With that being said would it make sense to have dynamic mm (set to the
>> length of search term - 1)?
>>
>> I also have a question around using a fuzzy search along with eDismax
>> parser , but I will ask that in a seperate post once I go thru that aspect
>> of eDismax parser.
>>
>> Thanks again !
>>
>>
>>
>>
>>
>> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky 
>> wrote:
>>
>>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
>>
>>> will be dwarfed.
>>>
>>> The general goal is to assure that the top documents really are the best,
>>> not to necessarily limit the total document count. Focusing on the latter
>>> could be a real waste of time.
>>>
>>> It's still not clear why or how you need or want to use OR as the default
>>> operator - you still haven't given us a use case for that.
>>>
>>> To repeat: Give us a full set of use cases before taking this XY Problem
>>> approach of pursuing a solution before the problem is understood.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: S.L
>>> Sent: Sunday, March 30, 2014 6:14 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: eDismax parser and the mm parameter
>>>
>>> Jacks Thanks Again,
>>>
>>> I am searching  Chinese medicine  documents , as the example I gave
>>> earlier
>>> a user can search for "Ginseng" or Siberian Ginseng or Red Siberian
>>> Ginseng
>>> , I certainly want to use pf parameter (which is not driven by mm
>>> parameter) , however for giving higher score to documents that have more
>>> of
>>> the terms I want to use edismax now if I give a mm of 3 and the searc

Re: eDismax parser and the mm parameter

2014-04-02 Thread S.L
Thanks Ahmet, I would definitely look into this . I appreciate that.


On Wed, Apr 2, 2014 at 7:47 PM, Ahmet Arslan  wrote:

> Yes, it has spellcheck.collate parameter. I mean it has lots of parameters
> and with correct combination of parameters
> it can suggest "White Siberian Ginseng" from "Whte Sberia Ginsng"
>
> https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>
>
>
>
> On Thursday, April 3, 2014 1:57 AM, "simpleliving...@gmail.com" <
> simpleliving...@gmail.com> wrote:
> Ahmet.
>
> Thanks I will look into this option . Does spellchecker support multiple
> word search terms?
>
> Sent from my HTC
>
> - Reply message -
> From: "Ahmet Arslan" 
> To: "solr-user@lucene.apache.org" 
> Subject: eDismax parser and the mm parameter
> Date: Wed, Apr 2, 2014 10:53 AM
>
> Hi SL,
>
> Instead of fuzzy queries, can't you use spell checker? Generally Spell
> Checker (a.k.a did you mean) is a preferred tool for typos.
>
> Ahmet
>
> On Wednesday, April 2, 2014 4:13 PM, "simpleliving...@gmail.com" <
> simpleliving...@gmail.com> wrote:
>
> It only works for a single word search term and not multiple word search
> term.
>
> Sent from my HTC
>
> - Reply message -
> From: "William Bell" 
> To: "solr-user@lucene.apache.org" 
> Subject: eDismax parser and the mm parameter
> Date: Wed, Apr 2, 2014 12:03 AM
>
> Fuzzy is provided use ~
>
>
> On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:
>
> > Jack ,
> >
> > Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
> > the mm parameter from my queries, however for the fuzzy phrase queries ,
> I
> > am not sure how I would be able to leverage the Complex Query Parser
> there
> > is absolutely nothing out there that gives me any idea as to how to do
> that
> > .
> >
> > Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
> >
> > Thanks.
> >
> >
> > On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky  > >wrote:
> >
> > > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> > > (the default) and ignore the mm parameter. Give pf the highest boost,
> and
> > > boost pf3 higher than pf2.
> > >
> > > You could try using the complex phrase query parser for the third case.
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: S.L
> > > Sent: Monday, March 31, 2014 12:08 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: eDismax parser and the mm parameter
> > >
> > > Thanks Jack , my use cases are as follows.
> > >
> > >
> > >   1. Search for "Ginseng" everything related to ginseng should show up.
> > >   2. Search For "White Siberian Ginseng" results with the whole phrase
> > >   show up first followed by 2 words from the phrase followed by a
> single
> > > word
> > >   in the phrase
> > >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
> > >   documents with White Siberian Ginseng Should show up , this looks
> like
> > > the
> > >   most complicated of all as Solr does not support fuzzy phrase
> searches
> > .
> > > (I
> > >   have no solution for this yet).
> > >
> > > Thanks again!
> > >
> > >
> > > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
> > j...@basetechnology.com>
> > > wrote:
> > >
> > >  The mm parameter is really only relevant when the default operator is
> OR
> > >> or explicit OR operators are used.
> > >>
> > >> Again: Please provide your use case examples and your expectations for
> > >> each use case. It really doesn't make a lot of sense to prematurely
> > focus
> > >> on a solution when you haven't clearly defined your use cases.
> > >>
> > >> -- Jack Krupansky
> > >>
> > >> -Original Message- From: S.L
> > >> Sent: Sunday, March 30, 2014 9:13 PM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: eDismax parser and the mm parameter
> > >>
> > >> Jack,
> > >>
> > >> I mis-stated the problem , I am not using the OR operator as default
> > >> now(now that I think about it it does not make sense to use the
> default
> > >> operator OR along with the mm parameter) , the reason

Re: eDismax parser and the mm parameter

2014-04-03 Thread S.L
Ahmet,

SpellChecker seems to be the  the exact thing that I need for fuzzy type
search , how can I combine SpellChecker with something like edismax parser
to make use of paramerters like pf,pf2 and pf3 . Is there any resource that
you can point me to do that ?

Thanks.


On Wed, Apr 2, 2014 at 9:12 PM, S.L  wrote:

> Thanks Ahmet, I would definitely look into this . I appreciate that.
>
>
> On Wed, Apr 2, 2014 at 7:47 PM, Ahmet Arslan  wrote:
>
>> Yes, it has spellcheck.collate parameter. I mean it has lots of
>> parameters and with correct combination of parameters
>> it can suggest "White Siberian Ginseng" from "Whte Sberia Ginsng"
>>
>> https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>>
>>
>>
>>
>> On Thursday, April 3, 2014 1:57 AM, "simpleliving...@gmail.com" <
>> simpleliving...@gmail.com> wrote:
>> Ahmet.
>>
>> Thanks I will look into this option . Does spellchecker support multiple
>> word search terms?
>>
>> Sent from my HTC
>>
>> - Reply message -
>> From: "Ahmet Arslan" 
>> To: "solr-user@lucene.apache.org" 
>> Subject: eDismax parser and the mm parameter
>> Date: Wed, Apr 2, 2014 10:53 AM
>>
>> Hi SL,
>>
>> Instead of fuzzy queries, can't you use spell checker? Generally Spell
>> Checker (a.k.a did you mean) is a preferred tool for typos.
>>
>> Ahmet
>>
>> On Wednesday, April 2, 2014 4:13 PM, "simpleliving...@gmail.com" <
>> simpleliving...@gmail.com> wrote:
>>
>> It only works for a single word search term and not multiple word search
>> term.
>>
>> Sent from my HTC
>>
>> - Reply message -
>> From: "William Bell" 
>> To: "solr-user@lucene.apache.org" 
>> Subject: eDismax parser and the mm parameter
>> Date: Wed, Apr 2, 2014 12:03 AM
>>
>> Fuzzy is provided use ~
>>
>>
>> On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:
>>
>> > Jack ,
>> >
>> > Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
>> > the mm parameter from my queries, however for the fuzzy phrase queries
>> , I
>> > am not sure how I would be able to leverage the Complex Query Parser
>> there
>> > is absolutely nothing out there that gives me any idea as to how to do
>> that
>> > .
>> >
>> > Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
>> >
>> > Thanks.
>> >
>> >
>> > On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky <
>> j...@basetechnology.com
>> > >wrote:
>> >
>> > > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use
>> q.op=OR
>> > > (the default) and ignore the mm parameter. Give pf the highest boost,
>> and
>> > > boost pf3 higher than pf2.
>> > >
>> > > You could try using the complex phrase query parser for the third
>> case.
>> > >
>> > > -- Jack Krupansky
>> > >
>> > > -Original Message- From: S.L
>> > > Sent: Monday, March 31, 2014 12:08 AM
>> > > To: solr-user@lucene.apache.org
>> > > Subject: Re: eDismax parser and the mm parameter
>> > >
>> > > Thanks Jack , my use cases are as follows.
>> > >
>> > >
>> > >   1. Search for "Ginseng" everything related to ginseng should show
>> up.
>> > >   2. Search For "White Siberian Ginseng" results with the whole phrase
>> > >   show up first followed by 2 words from the phrase followed by a
>> single
>> > > word
>> > >   in the phrase
>> > >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
>> > >   documents with White Siberian Ginseng Should show up , this looks
>> like
>> > > the
>> > >   most complicated of all as Solr does not support fuzzy phrase
>> searches
>> > .
>> > > (I
>> > >   have no solution for this yet).
>> > >
>> > > Thanks again!
>> > >
>> > >
>> > > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
>> > j...@basetechnology.com>
>> > > wrote:
>> > >
>> > >  The mm parameter is really only relevant when the default operator
>> is OR
>> > >> or explicit OR operators are used.
>> > >>
>> > >> Again: Please provide you

Combining eDismax and SpellChecker

2014-04-05 Thread S.L
Hi All,

I want to suggest the correct phrase if a typo is made while searching and
then search it using eDismax parser(pf,pf2,pf3), if no typo is made then
search it using eDismax parser alone.

Is there a way I can combine these two components , I have seen examples
for eDismax and also for SpellChecker , but nothing that combines these two
together.

Can you please let me know ?

Thanks.


Apache Solr SpellChecker Integration with the default select request handler

2014-04-12 Thread S.L
Hello fellow Solr users,

I am using the default select request handler to search a Solr core , I
also use the eDismaxquery parser.

   1.

   I want to integrate this with the spellchecker search component so that
   if a search request comes in the spellchecker component also gets called
   and I get a suggestion back with search results.
   2.

   If the suggestion is above a certain threshold then I want the search to
   be made on that suggestion , otherwise the suggestion should comeback along
   with the search results for the original search term.

In order to accomplish this it seems I need to integrate the
SearchHandler.java class to call the spellchecker internally and then make
a search call if the suggestion from the spellchecker has a suggestion that
is above a certain threshold.

I would really appreciate if there any examples of calling the SpellChecker
component via the API in Solr that someone can share with me and also if
you could validate my approach. Thank You.


Re: Apache Solr SpellChecker Integration with the default select request handler

2014-04-12 Thread S.L
Yes, I use solrJ , but only to index the data , the querying of the data
happens usinf the default select query handler from a non java client.


On Sat, Apr 12, 2014 at 12:12 PM, Furkan KAMACI wrote:

> Hi;
>
> Do you use Solrj at your application? Why you did not consider to use to
> solve this with Solrj?
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-04-12 18:34 GMT+03:00 S.L :
>
> > Hello fellow Solr users,
> >
> > I am using the default select request handler to search a Solr core , I
> > also use the eDismaxquery parser.
> >
> >1.
> >
> >I want to integrate this with the spellchecker search component so
> that
> >if a search request comes in the spellchecker component also gets
> called
> >and I get a suggestion back with search results.
> >2.
> >
> >If the suggestion is above a certain threshold then I want the search
> to
> >be made on that suggestion , otherwise the suggestion should comeback
> > along
> >with the search results for the original search term.
> >
> > In order to accomplish this it seems I need to integrate the
> > SearchHandler.java class to call the spellchecker internally and then
> make
> > a search call if the suggestion from the spellchecker has a suggestion
> that
> > is above a certain threshold.
> >
> > I would really appreciate if there any examples of calling the
> SpellChecker
> > component via the API in Solr that someone can share with me and also if
> > you could validate my approach. Thank You.
> >
>


Re: Apache Solr SpellChecker Integration with the default select request handler

2014-04-12 Thread S.L
Furkan,

I am not sure how this could be a security concern, what I am actually
asking is an approach to integrate the spellchecker search component within
the default request handler.

Thanks.


On Sat, Apr 12, 2014 at 5:38 PM, Furkan KAMACI wrote:

> Hi;
>
> I do not want to change the direction of your question but it is really
> good, secure and flexible to do such kind of things at your client (a java
> client or not). On the other *if *you let people to access your Solr
> instance directly it causes some security  issues.
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-04-12 19:26 GMT+03:00 S.L :
>
> > Yes, I use solrJ , but only to index the data , the querying of the data
> > happens usinf the default select query handler from a non java client.
> >
> >
> > On Sat, Apr 12, 2014 at 12:12 PM, Furkan KAMACI  > >wrote:
> >
> > > Hi;
> > >
> > > Do you use Solrj at your application? Why you did not consider to use
> to
> > > solve this with Solrj?
> > >
> > > Thanks;
> > > Furkan KAMACI
> > >
> > >
> > > 2014-04-12 18:34 GMT+03:00 S.L :
> > >
> > > > Hello fellow Solr users,
> > > >
> > > > I am using the default select request handler to search a Solr core
> , I
> > > > also use the eDismaxquery parser.
> > > >
> > > >1.
> > > >
> > > >I want to integrate this with the spellchecker search component so
> > > that
> > > >if a search request comes in the spellchecker component also gets
> > > called
> > > >and I get a suggestion back with search results.
> > > >2.
> > > >
> > > >If the suggestion is above a certain threshold then I want the
> > search
> > > to
> > > >be made on that suggestion , otherwise the suggestion should
> > comeback
> > > > along
> > > >with the search results for the original search term.
> > > >
> > > > In order to accomplish this it seems I need to integrate the
> > > > SearchHandler.java class to call the spellchecker internally and then
> > > make
> > > > a search call if the suggestion from the spellchecker has a
> suggestion
> > > that
> > > > is above a certain threshold.
> > > >
> > > > I would really appreciate if there any examples of calling the
> > > SpellChecker
> > > > component via the API in Solr that someone can share with me and also
> > if
> > > > you could validate my approach. Thank You.
> > > >
> > >
> >
>


Wordbreak spellchecker excessive breaking.

2014-05-24 Thread S.L
I am using Solr wordbreak spellchecker and the issue is that when I search
for a term like "mob ile" expecting that the wordbreak spellchecker would
actually resutn a suggestion for "mobile" it breaks the search term into
letters like "m o b"  I have two issues with this behavior.

 1. How can I make Solr combine "mob ile" to mobile?
 2. Not withstanding the fact that my search term "mob ile" is being broken
incorrectly into individual letters , I realize that the wordbreak is
needed in certain cases, how do I control the wordbreak so that it does not
break it into letters like "m o b" which seems like excessive breaking to
me ?

Thanks.


Re: Wordbreak spellchecker excessive breaking.

2014-05-26 Thread S.L
Anyone ?


On Sat, May 24, 2014 at 5:21 PM, S.L  wrote:

>
> I am using Solr wordbreak spellchecker and the issue is that when I search
> for a term like "mob ile" expecting that the wordbreak spellchecker would
> actually resutn a suggestion for "mobile" it breaks the search term into
> letters like "m o b"  I have two issues with this behavior.
>
>  1. How can I make Solr combine "mob ile" to mobile?
>  2. Not withstanding the fact that my search term "mob ile" is being
> broken incorrectly into individual letters , I realize that the wordbreak
> is needed in certain cases, how do I control the wordbreak so that it does
> not break it into letters like "m o b" which seems like excessive breaking
> to me ?
>
> Thanks.
>
>


Re: Wordbreak spellchecker excessive breaking.

2014-05-29 Thread S.L
James,

Thanks for clearly stating this , I was not able to find this documented
anywhere, yes I am using it with another spell checker (Direct) with the
collation on. I will try the maxChangtes and let you know.

On a side note , whenever I change the spellchecker parameter , I need to
rebuild the index  and delete the solr data directory before that  as my
Tomcat instance would not even start, can you let me know why ?

Thanks.




On Tue, May 27, 2014 at 12:21 PM, Dyer, James 
wrote:

> You can do this if you set it up like in the mail Solr example:
>
> 
> wordbreak
> solr.WordBreakSolrSpellChecker
> name
> true
> true
> 10
> 
>
> The "combineWords" and "breakWords" flags let you tell it which kind of
> workbreak correction you want.  "maxChanges" controls the maximum number of
> words it can break 1 word into, or the maximum number of words it can
> combine.  It is reasonable to set this to 1 or 2.
>
> The best way to use this is in conjunction with a "regular" spellchecker
> like DirectSolrSpellChecker.  When used together with the collation
> functionality, it should take a query like "mob ile" and depending on what
> actually returns results from your data, suggest either "mobile" or perhaps
> "mob lie" or both.  The one thing is cannot do is fix a transposition or
> misspelling and combine or break words in one shot.  That is, it cannot
> detect that "mob lie" should become "mobile".
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Saturday, May 24, 2014 4:21 PM
> To: solr-user@lucene.apache.org
> Subject: Wordbreak spellchecker excessive breaking.
>
> I am using Solr wordbreak spellchecker and the issue is that when I search
> for a term like "mob ile" expecting that the wordbreak spellchecker would
> actually resutn a suggestion for "mobile" it breaks the search term into
> letters like "m o b"  I have two issues with this behavior.
>
>  1. How can I make Solr combine "mob ile" to mobile?
>  2. Not withstanding the fact that my search term "mob ile" is being broken
> incorrectly into individual letters , I realize that the wordbreak is
> needed in certain cases, how do I control the wordbreak so that it does not
> break it into letters like "m o b" which seems like excessive breaking to
> me ?
>
> Thanks.
>


Re: Wordbreak spellchecker excessive breaking.

2014-05-30 Thread S.L
James,

Thanks , there is no error in the logs, it just that I do not get the start
up message in the log.

I do not see any warm up related configuration for any spell checker in my
solrconfig.xml , I have also pasted the auto warm related configuration
data below .



1024





















true

   
   

   
   20

   
   200

   


  

  


  

  static firstSearcher warming in solrconfig.xml

  



false


2

  



On Fri, May 30, 2014 at 10:20 AM, Dyer, James 
wrote:

> I am not sure why changing spellcheck parameters would prevent your server
> from restarting.  One thing to check is to see if you have warming queries
> running that involve spellcheck.  I think I remember from long ago there
> was (maybe still is) an obscure bug where sometimes it will lock up in rare
> cases when spellcheck is used in warming queries.  I do not remember
> exactly what caused this or if it was ever fixed.
>
> Besides that, you might want to post a stack trace or describe what
> happens when it doesn't restart.  Perhaps someone here will know what the
> problem is.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Friday, May 30, 2014 12:36 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Wordbreak spellchecker excessive breaking.
>
> James,
>
> Thanks for clearly stating this , I was not able to find this documented
> anywhere, yes I am using it with another spell checker (Direct) with the
> collation on. I will try the maxChangtes and let you know.
>
> On a side note , whenever I change the spellchecker parameter , I need to
> rebuild the index  and delete the solr data directory before that  as my
> Tomcat instance would not even start, can you let me know why ?
>
> Thanks.
>
>
>
>
> On Tue, May 27, 2014 at 12:21 PM, Dyer, James <
> james.d...@ingramcontent.com>
> wrote:
>
> > You can do this if you set it up like in the mail Solr example:
> >
> > 
> > wordbreak
> > solr.WordBreakSolrSpellChecker
> > name
> > true
> > true
> > 10
> > 
> >
> > The "combineWords" and "breakWords" flags let you tell it which kind of
> > workbreak correction you want.  "maxChanges" controls the maximum number
> of
> > words it can break 1 word into, or the maximum number of words it can
> > combine.  It is reasonable to set this to 1 or 2.
> >
> > The best way to use this is in conjunction with a "regular" spellchecker
> > like DirectSolrSpellChecker.  When used together with the collation
> > functionality, it should take a query like "mob ile" and depending on
> what
> > actually returns results from your data, suggest either "mobile" or
> perhaps
> > "mob lie" or both.  The one thing is cannot do is fix a transposition or
> > misspelling and combine or break words in one shot.  That is, it cannot
> > detect that "mob lie" should become "mobile".
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -Original Message-
> > From: S.L [mailto:simpleliving...@gmail.com]
> > Sent: Saturday, May 24, 2014 4:21 PM
> > To: solr-user@lucene.apache.org
> > Subject: Wordbreak spellchecker excessive breaking.
> >
> > I am using Solr wordbreak spellchecker and the issue is that when I
> search
> > for a term like "mob ile" expecting that the wordbreak spellchecker would
> > actually resutn a suggestion for "mobile" it breaks the search term into
> > letters like "m o b"  I have two issues with this behavior.
> >
> >  1. How can I make Solr combine "mob ile" to mobile?
> >  2. Not withstanding the fact that my search term "mob ile" is being
> broken
> > incorrectly into individual letters , I realize that the wordbreak is
> > needed in certain cases, how do I control the wordbreak so that it does
> not
> > break it into letters like "m o b" which seems like excessive breaking to
> > me ?
> >
> > Thanks.
> >
>


DirectSpellChecker not returning expected suggestions.

2014-05-30 Thread S.L
Hi All,

I have a small test index of 400 documents , it happens to have an entry
for  "wrangler", When I search for "wranglr", I correctly get the collation
suggestion as "wrangler", however when I search for "wrangle" , I do not
get a suggestion for "wrangler".

The Levenstien distance between wrangle --> wrangler is same as the
Levestien distance between wranglr-->wrangler , I am just wondering why I
do not get a suggestion for wrangle.

Below is my Direct spell checker configuration.


  direct
  suggestAggregate
  solr.DirectSolrSpellChecker
  
  internal
  score

  
  0.7
  
  1
  
  3
  
  5
  
  4
  
  0.01
  
  



Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
Anyone ?


On Sat, May 31, 2014 at 12:33 AM, S.L  wrote:

> Hi All,
>
> I have a small test index of 400 documents , it happens to have an entry
> for  "wrangler", When I search for "wranglr", I correctly get the collation
> suggestion as "wrangler", however when I search for "wrangle" , I do not
> get a suggestion for "wrangler".
>
> The Levenstien distance between wrangle --> wrangler is same as the
> Levestien distance between wranglr-->wrangler , I am just wondering why I
> do not get a suggestion for wrangle.
>
> Below is my Direct spell checker configuration.
>
> 
>   direct
>   suggestAggregate
>   solr.DirectSolrSpellChecker
>   
>   internal
>   score
>
>   
>   0.7
>   
>   1
>   
>   3
>   
>   5
>   
>   4
>   
>   0.01
>   
>   
> 
>
>


Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
I do not get any suggestion (when I search for "wrangle") , however I
correctly get the suggestion wrangler when I search for wranglr , I am
using the Direct and WordBreak spellcheckers in combination, I have not
tried using anything else.

Is the distance calculation of Solr different than what Levestien distance
calculation ? I have set maxEdits to 1 , assuming that this corresponds to
the maxDistance.

Thanks for your help!


On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com <
david.w.smi...@gmail.com> wrote:

> What do you get then?  Suggestions, but not the one you’re looking for, or
> is it deemed correctly spelled?
>
> Have you tried another spellChecker impl, for troubleshooting purposes?
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Sat, May 31, 2014 at 12:33 AM, S.L  wrote:
>
> > Hi All,
> >
> > I have a small test index of 400 documents , it happens to have an entry
> > for  "wrangler", When I search for "wranglr", I correctly get the
> collation
> > suggestion as "wrangler", however when I search for "wrangle" , I do not
> > get a suggestion for "wrangler".
> >
> > The Levenstien distance between wrangle --> wrangler is same as the
> > Levestien distance between wranglr-->wrangler , I am just wondering why I
> > do not get a suggestion for wrangle.
> >
> > Below is my Direct spell checker configuration.
> >
> > 
> >   direct
> >   suggestAggregate
> >   solr.DirectSolrSpellChecker
> >   
> >   internal
> >   score
> >
> >   
> >   0.7
> >   
> >   1
> >   
> >   3
> >   
> >   5
> >   
> >   4
> >   
> >   0.01
> >   
> >   
> > 
> >
>


Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
OK, I just realized that "wrangle" is a proper english word, probably thats
why I dont get a suggestion for "wrangler" in this case. How ever in my
test index there is no "wrangle" present , so even though this is a proper
english word , since there is no occurence of it in the index should'nt
Solr suggest me "wrangler" ?


On Mon, Jun 2, 2014 at 2:00 PM, S.L  wrote:

> I do not get any suggestion (when I search for "wrangle") , however I
> correctly get the suggestion wrangler when I search for wranglr , I am
> using the Direct and WordBreak spellcheckers in combination, I have not
> tried using anything else.
>
> Is the distance calculation of Solr different than what Levestien distance
> calculation ? I have set maxEdits to 1 , assuming that this corresponds to
> the maxDistance.
>
> Thanks for your help!
>
>
> On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com <
> david.w.smi...@gmail.com> wrote:
>
>> What do you get then?  Suggestions, but not the one you’re looking for, or
>> is it deemed correctly spelled?
>>
>> Have you tried another spellChecker impl, for troubleshooting purposes?
>>
>> ~ David Smiley
>> Freelance Apache Lucene/Solr Search Consultant/Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Sat, May 31, 2014 at 12:33 AM, S.L  wrote:
>>
>> > Hi All,
>> >
>> > I have a small test index of 400 documents , it happens to have an entry
>> > for  "wrangler", When I search for "wranglr", I correctly get the
>> collation
>> > suggestion as "wrangler", however when I search for "wrangle" , I do not
>> > get a suggestion for "wrangler".
>> >
>> > The Levenstien distance between wrangle --> wrangler is same as the
>> > Levestien distance between wranglr-->wrangler , I am just wondering why
>> I
>> > do not get a suggestion for wrangle.
>> >
>> > Below is my Direct spell checker configuration.
>> >
>> > 
>> >   direct
>> >   suggestAggregate
>> >   solr.DirectSolrSpellChecker
>> >   
>> >   internal
>> >   score
>> >
>> >   
>> >   0.7
>> >   
>> >   1
>> >   
>> >   3
>> >   
>> >   5
>> >   
>> >   4
>> >   
>> >   0.01
>> >   
>> >   
>> > 
>> >
>>
>
>


Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
Thanks, you mean "wrangler" , has been stemmed to "wrangle" , if thats the
case then why does it not return any results for "wrangle" ?


On Mon, Jun 2, 2014 at 2:07 PM, david.w.smi...@gmail.com <
david.w.smi...@gmail.com> wrote:

> It appears to be stemmed.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Mon, Jun 2, 2014 at 2:06 PM, S.L  wrote:
>
> > OK, I just realized that "wrangle" is a proper english word, probably
> thats
> > why I dont get a suggestion for "wrangler" in this case. How ever in my
> > test index there is no "wrangle" present , so even though this is a
> proper
> > english word , since there is no occurence of it in the index should'nt
> > Solr suggest me "wrangler" ?
> >
> >
> > On Mon, Jun 2, 2014 at 2:00 PM, S.L  wrote:
> >
> > > I do not get any suggestion (when I search for "wrangle") , however I
> > > correctly get the suggestion wrangler when I search for wranglr , I am
> > > using the Direct and WordBreak spellcheckers in combination, I have not
> > > tried using anything else.
> > >
> > > Is the distance calculation of Solr different than what Levestien
> > distance
> > > calculation ? I have set maxEdits to 1 , assuming that this corresponds
> > to
> > > the maxDistance.
> > >
> > > Thanks for your help!
> > >
> > >
> > > On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com <
> > > david.w.smi...@gmail.com> wrote:
> > >
> > >> What do you get then?  Suggestions, but not the one you’re looking
> for,
> > or
> > >> is it deemed correctly spelled?
> > >>
> > >> Have you tried another spellChecker impl, for troubleshooting
> purposes?
> > >>
> > >> ~ David Smiley
> > >> Freelance Apache Lucene/Solr Search Consultant/Developer
> > >> http://www.linkedin.com/in/davidwsmiley
> > >>
> > >>
> > >> On Sat, May 31, 2014 at 12:33 AM, S.L 
> > wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > I have a small test index of 400 documents , it happens to have an
> > entry
> > >> > for  "wrangler", When I search for "wranglr", I correctly get the
> > >> collation
> > >> > suggestion as "wrangler", however when I search for "wrangle" , I do
> > not
> > >> > get a suggestion for "wrangler".
> > >> >
> > >> > The Levenstien distance between wrangle --> wrangler is same as the
> > >> > Levestien distance between wranglr-->wrangler , I am just wondering
> > why
> > >> I
> > >> > do not get a suggestion for wrangle.
> > >> >
> > >> > Below is my Direct spell checker configuration.
> > >> >
> > >> > 
> > >> >   direct
> > >> >   suggestAggregate
> > >> >   solr.DirectSolrSpellChecker
> > >> >   
> > >> >   internal
> > >> >   score
> > >> >
> > >> >   
> > >> >   0.7
> > >> >   
> > >> >   1
> > >> >   
> > >> >   3
> > >> >   
> > >> >   5
> > >> >   
> > >> >   4
> > >> >   
> > >> >   0.01
> > >> >   
> > >> >   
> > >> > 
> > >> >
> > >>
> > >
> > >
> >
>


Re: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread S.L
James,

I get no results back and no suggestions  for "wrangle" , however I get
suggestions for "wranglr" , and "wrangle" is not present in my index.

I am just searching for "wrangle" in a field that is created by copying
other fields, as to how it is analyzed I dont have access to it now.

Thanks.


On Mon, Jun 2, 2014 at 2:48 PM, Dyer, James 
wrote:

> If "wrangle" is not in your index, and if it is within the max # of edits,
> then it should suggest it.
>
> Are you getting anything back from spellcheck at all?  What is the exact
> query you are using?  How is the spellcheck field analyzed?  If you're
> using stemming, then "wrangle" and "wrangler" might be stemmed to the same
> word. (by the way, you shouldn't spellcheck against a stemmed or otherwise
> heavily-analyzed field).
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Monday, June 02, 2014 1:06 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DirectSpellChecker not returning expected suggestions.
>
> OK, I just realized that "wrangle" is a proper english word, probably thats
> why I dont get a suggestion for "wrangler" in this case. How ever in my
> test index there is no "wrangle" present , so even though this is a proper
> english word , since there is no occurence of it in the index should'nt
> Solr suggest me "wrangler" ?
>
>
> On Mon, Jun 2, 2014 at 2:00 PM, S.L  wrote:
>
> > I do not get any suggestion (when I search for "wrangle") , however I
> > correctly get the suggestion wrangler when I search for wranglr , I am
> > using the Direct and WordBreak spellcheckers in combination, I have not
> > tried using anything else.
> >
> > Is the distance calculation of Solr different than what Levestien
> distance
> > calculation ? I have set maxEdits to 1 , assuming that this corresponds
> to
> > the maxDistance.
> >
> > Thanks for your help!
> >
> >
> > On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com <
> > david.w.smi...@gmail.com> wrote:
> >
> >> What do you get then?  Suggestions, but not the one you’re looking for,
> or
> >> is it deemed correctly spelled?
> >>
> >> Have you tried another spellChecker impl, for troubleshooting purposes?
> >>
> >> ~ David Smiley
> >> Freelance Apache Lucene/Solr Search Consultant/Developer
> >> http://www.linkedin.com/in/davidwsmiley
> >>
> >>
> >> On Sat, May 31, 2014 at 12:33 AM, S.L 
> wrote:
> >>
> >> > Hi All,
> >> >
> >> > I have a small test index of 400 documents , it happens to have an
> entry
> >> > for  "wrangler", When I search for "wranglr", I correctly get the
> >> collation
> >> > suggestion as "wrangler", however when I search for "wrangle" , I do
> not
> >> > get a suggestion for "wrangler".
> >> >
> >> > The Levenstien distance between wrangle --> wrangler is same as the
> >> > Levestien distance between wranglr-->wrangler , I am just wondering
> why
> >> I
> >> > do not get a suggestion for wrangle.
> >> >
> >> > Below is my Direct spell checker configuration.
> >> >
> >> > 
> >> >   direct
> >> >   suggestAggregate
> >> >   solr.DirectSolrSpellChecker
> >> >   
> >> >   internal
> >> >   score
> >> >
> >> >   
> >> >   0.7
> >> >   
> >> >   1
> >> >   
> >> >   3
> >> >   
> >> >   5
> >> >   
> >> >   4
> >> >   
> >> >   0.01
> >> >   
> >> >   
> >> > 
> >> >
> >>
> >
> >
>


Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi Folks,

I recently started using the spellchecker in my solrconfig.xml. I am able to 
build up an index in Solr.

But,if I ever shutdown tomcat I am not able to restart it.The server never 
spits out the server startup time in seconds in the logs,nor does it print any 
error messages in the catalina.out file.

The only way for me to get around this is by delete the data directory of the 
index and then start the server,obviously this makes me loose my index.

Just wondering if anyone faced a similar issue and if they were able to solve 
this.

Thanks.



Re: Strange Behavior with Solr in Tomcat.

2014-06-04 Thread S.L
Hi,

This is not a case of accidental deletion , the only way I can restart the
tomcat is by deleting the data directory for the index that was created
earlier, this started happening after I started using spellcheckers in my
solrconfig.xml. As long as the Tomcat is running its fine.

Any help from anyone who faced a similar issues would be appreciated.

Thanks.



On Wed, Jun 4, 2014 at 11:08 AM, Aman Tandon  wrote:

> I guess if you try to copy the index and then kill the process of tomcat
> then it might help. If still the index need to be delete you would have the
> back up. Next time always make back up.
> On Jun 4, 2014 7:55 PM, "S.L"  wrote:
>
> > Hi Folks,
> >
> > I recently started using the spellchecker in my solrconfig.xml. I am able
> > to build up an index in Solr.
> >
> > But,if I ever shutdown tomcat I am not able to restart it.The server
> never
> > spits out the server startup time in seconds in the logs,nor does it
> print
> > any error messages in the catalina.out file.
> >
> > The only way for me to get around this is by delete the data directory of
> > the index and then start the server,obviously this makes me loose my
> index.
> >
> > Just wondering if anyone faced a similar issue and if they were able to
> > solve this.
> >
> > Thanks.
> >
> >
>


Re: Strange Behavior with Solr in Tomcat.

2014-06-06 Thread S.L
Anyone folks?


On Wed, Jun 4, 2014 at 10:25 AM, S.L  wrote:

>  Hi Folks,
>
> I recently started using the spellchecker in my solrconfig.xml. I am able
> to build up an index in Solr.
>
> But,if I ever shutdown tomcat I am not able to restart it.The server never
> spits out the server startup time in seconds in the logs,nor does it print
> any error messages in the catalina.out file.
>
> The only way for me to get around this is by delete the data directory of
> the index and then start the server,obviously this makes me loose my index.
>
> Just wondering if anyone faced a similar issue and if they were able to
> solve this.
>
> Thanks.
>
>


Re: Strange Behavior with Solr in Tomcat.

2014-06-07 Thread S.L
Thanks, Meraj, that was exactly the issue , setting
true worked like a charm and the server
starts up as usual.

Thanks again!


On Fri, Jun 6, 2014 at 2:42 PM, Meraj A. Khan  wrote:

> This looks distinctly related to
> https://issues.apache.org/jira/browse/SOLR-4408 , try coldSearcher = true
> as being suggested in JIRA and let us know .
>
>
> On Fri, Jun 6, 2014 at 2:39 PM, Jean-Sebastien Vachon <
> jean-sebastien.vac...@wantedanalytics.com> wrote:
>
> > I would try a thread dump and check the output to see what`s going on.
> > You could also strace the process if you`re running on Unix or changed
> the
> > log level in Solr to get more information logged
> >
> > > -Original Message-
> > > From: S.L [mailto:simpleliving...@gmail.com]
> > > Sent: June-06-14 2:33 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Strange Behavior with Solr in Tomcat.
> > >
> > > Anyone folks?
> > >
> > >
> > > On Wed, Jun 4, 2014 at 10:25 AM, S.L 
> wrote:
> > >
> > > >  Hi Folks,
> > > >
> > > > I recently started using the spellchecker in my solrconfig.xml. I am
> > > > able to build up an index in Solr.
> > > >
> > > > But,if I ever shutdown tomcat I am not able to restart it.The server
> > > > never spits out the server startup time in seconds in the logs,nor
> > > > does it print any error messages in the catalina.out file.
> > > >
> > > > The only way for me to get around this is by delete the data
> directory
> > > > of the index and then start the server,obviously this makes me loose
> my
> > > index.
> > > >
> > > > Just wondering if anyone faced a similar issue and if they were able
> > > > to solve this.
> > > >
> > > > Thanks.
> > > >
> > > >
> > >
> > > -
> > > Aucun virus trouvé dans ce message.
> > > Analyse effectuée par AVG - www.avg.fr
> > > Version: 2014.0.4570 / Base de données virale: 3950/7571 - Date:
> > > 27/05/2014 La Base de données des virus a expiré.
> >
>


Re: Is it possible for solr to calculate and give back the price of a product based on its sub-products

2014-06-08 Thread S.L
I am not sure if that is doable , I think it needs to be taken care of at
the indexing time.


On Sun, Jun 8, 2014 at 4:55 PM, Gharbi Mohamed <
gharbi.mohamed.e...@gmail.com> wrote:

> Hi,
>
> I am using Solr for searching magento products in my project,
> I want to know, is it possible for solr to calculate and give back the
> price
> of a product based on its sub-products(items);
>
> For instance, i have a product P1 and it is the parent of items m1, m2.
> i need to get the minimal price of items and return it as a price of
> product
> P1.
>
> I'm wondering if that is possible ?
> I need to know if solr can do that or if there is a feature or a way to do
> it ?
> And finally i thank you!
>
> regards,
> Mohamed.
>
>


Spell checker - limit on number of misspelt words in a search term.

2014-06-17 Thread S.L
Hi All,

I am using the Direct Spell checker component and I have collate =true in
my solrconfig.xml.

The issue that I noticed is that , when I have a search term with upto two
words in it and if both of them are misspelled  I get a collation query  as
a suggestion in the spellchecker output, if I increase the search term
length to 3 words and spell all of them incorrectly then I do not get a
collation query as an output in the spell checker suggestions.

Is there a setting in solrconfig.xml file that's  controlling this behavior
by restricting the length of the search term to be up to two misspelt words
to suggest a collation query, if so I would need to change the property.

Can anyone please let me know how to do so ?

Thanks.

Sent from my mobile.


Crawl-Delay in robots.txt and fetcher.threads.per.queue property in Nutch

2014-06-25 Thread S.L
Hello All

If I set fetcher.threads.per.queue property to more than 1 , I believe the 
behavior would be to have those many number of threads per host from Nutch, in 
that case would Nutch still respect the Crawl-Delay directive in robots.txt and 
not crawl at a faster pace that what is specified in robots.txt.

In short what I am trying to ask is if setting fetcher.threads.per.queue to 1 
is required for being as polite as Crawl-Delay in robots.txt expects?

Thx



SolrJ 503 Error

2013-12-21 Thread S.L
Hi All,

I am running a single Solr instance with version 4.4 with Apache
Tomcat 7.0.42 ,I am aslo running a Nutch instance with about 20
threads and each thread is committing a document in the Solr index
using the Solrj API , the version of Solrj API I

use is 4.3.1 , can anyone please let me know if this error is occuring
because I am committing documents too fast for a single instance of a
server or is it because of any other underlying issue , please let me
know.

Thanks.



org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Server at http://localhost:8081/solr returned non ok status:503,
message:Service Unavailable
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
~[DynaOCrawlerUtils.jar:?]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
~[DynaOCrawlerUtils.jar:?]
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
~[solr-solrj-3.4.0.jar:3.4.0 1167142 - mike - 2011-09-09 09:06:50]
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)
~[solr-solrj-3.4.0.jar:3.4.0 1167142 - mike - 2011-09-09 09:06:50]
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75)
~[solr-solrj-3.4.0.jar:3.4.0 1167142 - mike - 2011-09-09 09:06:50]
at 
com.xyz.DynaOCrawlerUtils.SolrDynaOUtils.createSolrInputDocumentAndPopulateSolrIndex(SolrDynaOUtils.java:101)
~[DynaOCrawlerUtils.jar:?]
at 
com.xyz.DynaOCrawlerUtils.SolrCallbackForNXParser.populateModelToSolrIndex(SolrCallbackForNXParser.java:216)
[DynaOCrawlerUtils.jar:?]
at 
com.xyz.DynaOCrawlerUtils.SolrCallbackForNXParser.endDocument(SolrCallbackForNXParser.java:87)
[DynaOCrawlerUtils.jar:?]
at 
com.xyz.DynaOCrawlerUtils.SolrDynaOUtils.populateSolrIndexFromCurrentURL(SolrDynaOUtils.java:250)
[DynaOCrawlerUtils.jar:?]
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:716)
[job.jar:?]


Re: SolrJ 503 Error

2013-12-21 Thread S.L
I have a 8GB machine , and I commit for each and every document that is
added to Solr, not sure if I am missing anything here , but it seems I
could use auto commit from your response , in that case do I not need to
call the commit call , can you please point me to a resource that explains
this ?
Thanks.


On Sat, Dec 21, 2013 at 2:48 PM, Andrea Gazzarini wrote:

> Not sure if we have the same scenario but I got the same error code when I
> was tryjng to do a lot of requests (updates and queries) with 10 secs of
> (hard) autocommit to a SOLR instance running in servlet engine (tomcat)
> with few resources (if I remember no more than 1GB of ram)
>
> Andrea
> Hi All,
>
> I am running a single Solr instance with version 4.4 with Apache
> Tomcat 7.0.42 ,I am aslo running a Nutch instance with about 20
> threads and each thread is committing a document in the Solr index
> using the Solrj API , the version of Solrj API I
>
> use is 4.3.1 , can anyone please let me know if this error is occuring
> because I am committing documents too fast for a single instance of a
> server or is it because of any other underlying issue , please let me
> know.
>
> Thanks.
>
>
>
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> Server at http://localhost:8081/solr returned non ok status:503,
> message:Service Unavailable
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
> ~[DynaOCrawlerUtils.jar:?]
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> ~[DynaOCrawlerUtils.jar:?]
> at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> ~[solr-solrj-3.4.0.jar:3.4.0 1167142 - mike - 2011-09-09 09:06:50]
> at
> org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)
> ~[solr-solrj-3.4.0.jar:3.4.0 1167142 - mike - 2011-09-09 09:06:50]
> at
> org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75)
> ~[solr-solrj-3.4.0.jar:3.4.0 1167142 - mike - 2011-09-09 09:06:50]
> at
>
> com.xyz.DynaOCrawlerUtils.SolrDynaOUtils.createSolrInputDocumentAndPopulateSolrIndex(SolrDynaOUtils.java:101)
> ~[DynaOCrawlerUtils.jar:?]
> at
>
> com.xyz.DynaOCrawlerUtils.SolrCallbackForNXParser.populateModelToSolrIndex(SolrCallbackForNXParser.java:216)
> [DynaOCrawlerUtils.jar:?]
> at
>
> com.xyz.DynaOCrawlerUtils.SolrCallbackForNXParser.endDocument(SolrCallbackForNXParser.java:87)
> [DynaOCrawlerUtils.jar:?]
> at
>
> com.xyz.DynaOCrawlerUtils.SolrDynaOUtils.populateSolrIndexFromCurrentURL(SolrDynaOUtils.java:250)
> [DynaOCrawlerUtils.jar:?]
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:716)
> [job.jar:?]
>


Intermittent error indexing SolrCloud 4.7.0

2014-08-19 Thread S.L
Hi All,

I get "No Live SolrServers available to handle this request" error
intermittently while indexing in a SolrCloud cluster with 3 shards and
replication factor of 2.

I am using Solr 4.7.0.

Please see the stack trace below.

org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:352)
~[DynaOCrawlerUtils.jar:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:640)
~[DynaOCrawlerUtils.jar:?]
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
~[DynaOCrawlerUtils.jar:?]
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
~[DynaOCrawlerUtils.jar:?]
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
~[DynaOCrawlerUtils.jar:?]


pySolr and other Python client options for SolrCloud.

2014-10-01 Thread S.L
Hi All,

We recently moved from a single Solr instance to SolrCloud and we are using
pysolr , I am wondering what options (clients)  we have from Python  to
take advantage of Zookeeper and load balancing capabilities that SolrCloud
provides if I were to use a smart client like Solrj?

Thanks.


Re: pySolr and other Python client options for SolrCloud.

2014-10-01 Thread S.L
Right , but my query was to know if there are any Python clients which
achieve the same thing as SolrJ  , or the approach one should take when
using Python based clients.

On Wed, Oct 1, 2014 at 3:57 PM, Upayavira  wrote:

>
>
> On Wed, Oct 1, 2014, at 08:47 PM, S.L wrote:
> > Hi All,
> >
> > We recently moved from a single Solr instance to SolrCloud and we are
> > using
> > pysolr , I am wondering what options (clients)  we have from Python  to
> > take advantage of Zookeeper and load balancing capabilities that
> > SolrCloud
> > provides if I were to use a smart client like Solrj?
>
> Obviously SolrJ is Java, not Python. SolrJ has integration with
> Zookeeper, so when you instantiate a CloudSolrServer instance, you tell
> it where Zookeeper is, not Solr. Your app then consults Zookeeper to
> find out which Solr instance to talk to.
>
> This means you can move stuff around within your infrastructure without
> needing to tell your app, and without needing to mess with load
> balancers as that is all handled for you by the SolrJ client deciding
> which node to forward your request.
>
> Upayavira
>


Re: pySolr and other Python client options for SolrCloud.

2014-10-01 Thread S.L
Shawn,

Thanks ,load balancer seems to be the preferred solution here , I have a
topology where I have 6 Solr nodes that support 3 shards with a replication
factor of 2.

Looks like it woul dbe better to use the load balancers for querying
only.The question, that I have is if I go the load balancer route should I
be listing all the six nodes in the load balancer or only the leaders as
identified by SolrCloud admin console?Would the load balancing solution
also incur any additional routing of requests between the individual nodes
of SolrCloud that would have not happened had the python Solr client been
zookeeper aware?

Also for indexing ,which is not done from a Python client but is done using
Solrj, I will avoid the load balancers and do the indexing  it via the
Zookeeper route.

Thanks.

On Wed, Oct 1, 2014 at 8:42 PM, Shawn Heisey  wrote:

> On 10/1/2014 2:29 PM, S.L wrote:
> > Right , but my query was to know if there are any Python clients which
> > achieve the same thing as SolrJ  , or the approach one should take when
> > using Python based clients.
>
> If the python client can support multiple hosts and failing over between
> them, then you would simply list multiple URLs.  If not, then you'll
> need a load balancer.  I use haproxy with Solr (not in Cloud mode) for
> automatic failover, and it should work equally well for SolrCloud and a
> non-java client.
>
> It looks like Alexandre knows a lot more about it than I do ... I know
> very little about python.
>
> Thanks,
> Shawn
>
>


Re: pySolr and other Python client options for SolrCloud.

2014-10-01 Thread S.L
That makes perfect sense , thanks again!

On Wed, Oct 1, 2014 at 10:09 PM, Shawn Heisey  wrote:

> On 10/1/2014 7:08 PM, S.L wrote:
> > Thanks ,load balancer seems to be the preferred solution here , I have a
> > topology where I have 6 Solr nodes that support 3 shards with a
> replication
> > factor of 2.
> >
> > Looks like it woul dbe better to use the load balancers for querying
> > only.The question, that I have is if I go the load balancer route should
> I
> > be listing all the six nodes in the load balancer or only the leaders as
> > identified by SolrCloud admin console?Would the load balancing solution
> > also incur any additional routing of requests between the individual
> nodes
> > of SolrCloud that would have not happened had the python Solr client been
> > zookeeper aware?
> >
> > Also for indexing ,which is not done from a Python client but is done
> using
> > Solrj, I will avoid the load balancers and do the indexing  it via the
> > Zookeeper route.
>
> If you were to send all your queries to just one server, it's my
> understanding that SolrCloud will load balance the actual work across
> the cloud.  I have not verified this.
>
> For a load balancer, the minimum requirement would be to list two of the
> servers, but it's probably better to list them all.  Leader designations
> can change, and I'm pretty sure you don't want to change your load
> balancer config just because the leader changed.
>
> If your 3 shards are using automatic document routing, then you can send
> updates to any machine in the cluster and they'll end up in the right
> place.  Since you're using SolrJ for updates, this is probably not
> something you need to worry about.
>
> Thanks,
> Shawn
>
>


SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-02 Thread S.L
Hi All,

I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
replication factor of 2 .

I have fronted these 6 Solr nodes using a load balancer , what I notice is
that every time I do a search of the form
q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
only once in every 3 tries , telling me that the load balancer is
distributing the requests between the 3 shards and SolrCloud only returns a
result if the request goes to the core that as that id .

However if I do a simple search like q=*:* , I consistently get the right
aggregated results back of all the documents across all the shards for
every request from the load balancer. Can someone please let me know what
this is symptomatic of ?

Somehow Solr Cloud seems to be doing search query distribution and
aggregation for queries of type *:* only.

Thanks.


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-02 Thread S.L
Erick,

Thanks for your reply, I tried your suggestions.

1 . When not using loadbalancer if  *I have distrib=false* I get consistent
results across the replicas.

2. However here's the insteresting part , while not using load balancer if
I *dont have distrib=false* , then when I query a particular node ,I get
the same behaviour as if I were using a loadbalancer , meaning the
distributed search from a node works intermittently .Does this give any
clue ?



On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson 
wrote:

> Hmmm, nothing quite makes sense here
>
> Here are some experiments:
> 1> avoid the load balancer and issue queries like
> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>
> the &distrib=false bit will cause keep SolrCloud from trying to send
> the queries anywhere, they'll be served only from the node you address
> them to.
> that'll help check whether the nodes are consistent. You should be
> getting back the same results from each replica in a shard (i.e. 2 of
> your 6 machines).
>
> Next, try your failing query the same way.
>
> Next, try your failing query from a browser, pointing it at successive
> nodes.
>
> Where is the first place problems show up?
>
> My _guess_ is that your load balancer isn't quite doing what you think, or
> your cluster isn't set up the way you think it is, but those are guesses.
>
> Best,
> Erick
>
> On Thu, Oct 2, 2014 at 2:51 PM, S.L  wrote:
> > Hi All,
> >
> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
> > replication factor of 2 .
> >
> > I have fronted these 6 Solr nodes using a load balancer , what I notice
> is
> > that every time I do a search of the form
> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
> > only once in every 3 tries , telling me that the load balancer is
> > distributing the requests between the 3 shards and SolrCloud only
> returns a
> > result if the request goes to the core that as that id .
> >
> > However if I do a simple search like q=*:* , I consistently get the right
> > aggregated results back of all the documents across all the shards for
> > every request from the load balancer. Can someone please let me know what
> > this is symptomatic of ?
> >
> > Somehow Solr Cloud seems to be doing search query distribution and
> > aggregation for queries of type *:* only.
> >
> > Thanks.
>


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-02 Thread S.L
Erick,

I would like to add that the interesting behavior i.e point #2 that I
mentioned in my earlier reply  happens in all the shards , if this were to
be a distributed search issue this should have not manifested itself in the
shard that contains the key that I am searching for , looks like the search
is just failing as whole intermittently .

Also ,the collection is being actively indexed as I query this, could that
be an issue too ?

Thanks.

On Thu, Oct 2, 2014 at 10:24 PM, S.L  wrote:

> Erick,
>
> Thanks for your reply, I tried your suggestions.
>
> 1 . When not using loadbalancer if  *I have distrib=false* I get
> consistent results across the replicas.
>
> 2. However here's the insteresting part , while not using load balancer if
> I *dont have distrib=false* , then when I query a particular node ,I get
> the same behaviour as if I were using a loadbalancer , meaning the
> distributed search from a node works intermittently .Does this give any
> clue ?
>
>
>
> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson 
> wrote:
>
>> Hmmm, nothing quite makes sense here
>>
>> Here are some experiments:
>> 1> avoid the load balancer and issue queries like
>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>>
>> the &distrib=false bit will cause keep SolrCloud from trying to send
>> the queries anywhere, they'll be served only from the node you address
>> them to.
>> that'll help check whether the nodes are consistent. You should be
>> getting back the same results from each replica in a shard (i.e. 2 of
>> your 6 machines).
>>
>> Next, try your failing query the same way.
>>
>> Next, try your failing query from a browser, pointing it at successive
>> nodes.
>>
>> Where is the first place problems show up?
>>
>> My _guess_ is that your load balancer isn't quite doing what you think, or
>> your cluster isn't set up the way you think it is, but those are guesses.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 2, 2014 at 2:51 PM, S.L  wrote:
>> > Hi All,
>> >
>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
>> > replication factor of 2 .
>> >
>> > I have fronted these 6 Solr nodes using a load balancer , what I notice
>> is
>> > that every time I do a search of the form
>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a result
>> > only once in every 3 tries , telling me that the load balancer is
>> > distributing the requests between the 3 shards and SolrCloud only
>> returns a
>> > result if the request goes to the core that as that id .
>> >
>> > However if I do a simple search like q=*:* , I consistently get the
>> right
>> > aggregated results back of all the documents across all the shards for
>> > every request from the load balancer. Can someone please let me know
>> what
>> > this is symptomatic of ?
>> >
>> > Somehow Solr Cloud seems to be doing search query distribution and
>> > aggregation for queries of type *:* only.
>> >
>> > Thanks.
>>
>
>


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-02 Thread S.L
Eirck,

0> Load balancer is out of the picture
.
1>When I query with *distrib=false* , I get consistent results as expected
for those shards that dont have the key i.e I dont get the results back for
those shards, however I just realized that while *distrib=false* is present
in the query for the shard that is supposed to contain the key,only the
replica of the shard that has this key returns the result , and the leader
does not , looks like replica and the leader do not have the same data and
replica seems to contain the key in the query for that shard.

2> By indexing I mean this collection is being populated by a web crawler.

So looks like 1> above  is pointing to leader and replica being out of
synch for atleast one shard.



On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson 
wrote:

> bq: Also ,the collection is being actively indexed as I query this, could
> that
> be an issue too ?
>
> Not if the documents you're searching aren't being added as you search
> (and all your autocommit intervals have expired).
>
> I would turn off indexing for testing, it's just one more variable
> that can get in the way of understanding this.
>
> Do note that if the problem were endemic to Solr, there would probably
> be a _lot_ more noise out there.
>
> So to recap:
> 0> we can take the load balancer out of the picture all together.
>
> 1> when you query each shard individually with &distrib=true, every
> replica in a particular shard returns the same count.
>
> 2> when you query without &distrib=true you get varying counts.
>
> This is very strange and not at all expected. Let's try it again
> without indexing going on
>
> And what do you mean by "indexing" anyway? How are documents being fed
> to your system?
>
> Best,
> Erick@PuzzledAsWell
>
> On Thu, Oct 2, 2014 at 7:32 PM, S.L  wrote:
> > Erick,
> >
> > I would like to add that the interesting behavior i.e point #2 that I
> > mentioned in my earlier reply  happens in all the shards , if this were
> to
> > be a distributed search issue this should have not manifested itself in
> the
> > shard that contains the key that I am searching for , looks like the
> search
> > is just failing as whole intermittently .
> >
> > Also ,the collection is being actively indexed as I query this, could
> that
> > be an issue too ?
> >
> > Thanks.
> >
> > On Thu, Oct 2, 2014 at 10:24 PM, S.L  wrote:
> >
> >> Erick,
> >>
> >> Thanks for your reply, I tried your suggestions.
> >>
> >> 1 . When not using loadbalancer if  *I have distrib=false* I get
> >> consistent results across the replicas.
> >>
> >> 2. However here's the insteresting part , while not using load balancer
> if
> >> I *dont have distrib=false* , then when I query a particular node ,I get
> >> the same behaviour as if I were using a loadbalancer , meaning the
> >> distributed search from a node works intermittently .Does this give any
> >> clue ?
> >>
> >>
> >>
> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson  >
> >> wrote:
> >>
> >>> Hmmm, nothing quite makes sense here
> >>>
> >>> Here are some experiments:
> >>> 1> avoid the load balancer and issue queries like
> >>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
> >>>
> >>> the &distrib=false bit will cause keep SolrCloud from trying to send
> >>> the queries anywhere, they'll be served only from the node you address
> >>> them to.
> >>> that'll help check whether the nodes are consistent. You should be
> >>> getting back the same results from each replica in a shard (i.e. 2 of
> >>> your 6 machines).
> >>>
> >>> Next, try your failing query the same way.
> >>>
> >>> Next, try your failing query from a browser, pointing it at successive
> >>> nodes.
> >>>
> >>> Where is the first place problems show up?
> >>>
> >>> My _guess_ is that your load balancer isn't quite doing what you
> think, or
> >>> your cluster isn't set up the way you think it is, but those are
> guesses.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L  wrote:
> >>> > Hi All,
> >>> >
> >>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
> >>> > replication factor of 2 .
> >>>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-06 Thread S.L
Hi Erick,

Before I tried your suggestion of  issung a commit=true update, I realized that 
for eaach shard there was atleast a node that had its index directory named 
like index..

I went ahead and deleted index directory that restarted that core and now the 
index directory got syched with the other node and is properly named as 'index' 
without any timestamp attached to it.This is now giving me consistent results 
for distrib=true using a load balancer.Also distrib=false returns expexted 
results for a given shard.

The underlying issue appears to be that in every shard the leader and the 
replica(follower) were out of sych.

How can I avoid this from happening again?

Thanks for your help!

Sent from my HTC

- Reply message -
From: "Erick Erickson" 
To: 
Subject: SolrCloud 4.7 not doing distributed search when querying from a load 
balancer.
Date: Fri, Oct 3, 2014 12:56 AM

H. Assuming that you aren't re-indexing the doc you're searching for...

Try issuing http://blah blah:8983/solr/collection/update?commit=true.
That'll force all the docs to be searchable. Does <1> still hold for
the document in question? Because this is exactly backwards of what
I'd expect. I'd expect, if anything, the replica (I'm trying to call
it the "follower" when a distinction needs to be made since the leader
is a "replica" too) would be out of sync. This is still a Bad
Thing, but the leader gets first crack at indexing thing.

bq: only the replica of the shard that has this key returns the result
, and the leader does not ,

Just to be sure we're talking about the same thing. When you say
"leader", you mean the shard leader, right? The filled-in circle on
the graph view from the admin/cloud page.

And let's see your soft and hard commit settings please.

Best,
Erick

On Thu, Oct 2, 2014 at 9:48 PM, S.L  wrote:
> Eirck,
>
> 0> Load balancer is out of the picture
> .
> 1>When I query with *distrib=false* , I get consistent results as expected
> for those shards that dont have the key i.e I dont get the results back for
> those shards, however I just realized that while *distrib=false* is present
> in the query for the shard that is supposed to contain the key,only the
> replica of the shard that has this key returns the result , and the leader
> does not , looks like replica and the leader do not have the same data and
> replica seems to contain the key in the query for that shard.
>
> 2> By indexing I mean this collection is being populated by a web crawler.
>
> So looks like 1> above  is pointing to leader and replica being out of
> synch for atleast one shard.
>
>
>
> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson 
> wrote:
>
>> bq: Also ,the collection is being actively indexed as I query this, could
>> that
>> be an issue too ?
>>
>> Not if the documents you're searching aren't being added as you search
>> (and all your autocommit intervals have expired).
>>
>> I would turn off indexing for testing, it's just one more variable
>> that can get in the way of understanding this.
>>
>> Do note that if the problem were endemic to Solr, there would probably
>> be a _lot_ more noise out there.
>>
>> So to recap:
>> 0> we can take the load balancer out of the picture all together.
>>
>> 1> when you query each shard individually with &distrib=true, every
>> replica in a particular shard returns the same count.
>>
>> 2> when you query without &distrib=true you get varying counts.
>>
>> This is very strange and not at all expected. Let's try it again
>> without indexing going on
>>
>> And what do you mean by "indexing" anyway? How are documents being fed
>> to your system?
>>
>> Best,
>> Erick@PuzzledAsWell
>>
>> On Thu, Oct 2, 2014 at 7:32 PM, S.L  wrote:
>> > Erick,
>> >
>> > I would like to add that the interesting behavior i.e point #2 that I
>> > mentioned in my earlier reply  happens in all the shards , if this were
>> to
>> > be a distributed search issue this should have not manifested itself in
>> the
>> > shard that contains the key that I am searching for , looks like the
>> search
>> > is just failing as whole intermittently .
>> >
>> > Also ,the collection is being actively indexed as I query this, could
>> that
>> > be an issue too ?
>> >
>> > Thanks.
>> >
>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L  wrote:
>> >
>> >> Erick,
>> >>
>> >> Thanks for your reply, I tried your suggestions.
>> >>
>&

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-06 Thread S.L
Erick,

Thanks for the suggestion , I am not sure if I would be able to capture
what went wrong , so upgrading to 4.10 seems easier even though it means ,
a days work of effort :) . I will go ahead and upgrade and let me know ,
although I am surprised that this issue never got reported for 4.7 up until
now.

Thanks again for your help!



On Mon, Oct 6, 2014 at 10:52 PM, Erick Erickson 
wrote:

> I think there were some holes that would allow replicas and leaders to
> be out of synch that have been patched up in the last 3 releases.
>
> There shouldn't be anything you need to do to keep these in synch, so
> if you can capture what happened when things got out of synch we'll
> fix it. But a lot has changed in the last several months, so the first
> thing I'd do if possible is to upgrade to 4.10.1.
>
>
> Best,
> Erick
>
> On Mon, Oct 6, 2014 at 2:41 PM, S.L  wrote:
> > Hi Erick,
> >
> > Before I tried your suggestion of  issung a commit=true update, I
> realized that for eaach shard there was atleast a node that had its index
> directory named like index..
> >
> > I went ahead and deleted index directory that restarted that core and
> now the index directory got syched with the other node and is properly
> named as 'index' without any timestamp attached to it.This is now giving me
> consistent results for distrib=true using a load balancer.Also
> distrib=false returns expexted results for a given shard.
> >
> > The underlying issue appears to be that in every shard the leader and
> the replica(follower) were out of sych.
> >
> > How can I avoid this from happening again?
> >
> > Thanks for your help!
> >
> > Sent from my HTC
> >
> > - Reply message -
> > From: "Erick Erickson" 
> > To: 
> > Subject: SolrCloud 4.7 not doing distributed search when querying from a
> load balancer.
> > Date: Fri, Oct 3, 2014 12:56 AM
> >
> > H. Assuming that you aren't re-indexing the doc you're searching
> for...
> >
> > Try issuing http://blah blah:8983/solr/collection/update?commit=true.
> > That'll force all the docs to be searchable. Does <1> still hold for
> > the document in question? Because this is exactly backwards of what
> > I'd expect. I'd expect, if anything, the replica (I'm trying to call
> > it the "follower" when a distinction needs to be made since the leader
> > is a "replica" too) would be out of sync. This is still a Bad
> > Thing, but the leader gets first crack at indexing thing.
> >
> > bq: only the replica of the shard that has this key returns the result
> > , and the leader does not ,
> >
> > Just to be sure we're talking about the same thing. When you say
> > "leader", you mean the shard leader, right? The filled-in circle on
> > the graph view from the admin/cloud page.
> >
> > And let's see your soft and hard commit settings please.
> >
> > Best,
> > Erick
> >
> > On Thu, Oct 2, 2014 at 9:48 PM, S.L  wrote:
> >> Eirck,
> >>
> >> 0> Load balancer is out of the picture
> >> .
> >> 1>When I query with *distrib=false* , I get consistent results as
> expected
> >> for those shards that dont have the key i.e I dont get the results back
> for
> >> those shards, however I just realized that while *distrib=false* is
> present
> >> in the query for the shard that is supposed to contain the key,only the
> >> replica of the shard that has this key returns the result , and the
> leader
> >> does not , looks like replica and the leader do not have the same data
> and
> >> replica seems to contain the key in the query for that shard.
> >>
> >> 2> By indexing I mean this collection is being populated by a web
> crawler.
> >>
> >> So looks like 1> above  is pointing to leader and replica being out of
> >> synch for atleast one shard.
> >>
> >>
> >>
> >> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <
> erickerick...@gmail.com>
> >> wrote:
> >>
> >>> bq: Also ,the collection is being actively indexed as I query this,
> could
> >>> that
> >>> be an issue too ?
> >>>
> >>> Not if the documents you're searching aren't being added as you search
> >>> (and all your autocommit intervals have expired).
> >>>
> >>> I would turn off indexing for testing, it's just one more variable
> >>> that can get in the way of underst

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-13 Thread S.L
t it'd be a good place to start.
>
> Things have been reported off and on, but they're often pesky race
> conditions or something else that takes a long time to track down, you
> just are lucky perhaps ;)...
>
> Erick
>
> On Mon, Oct 6, 2014 at 8:04 PM, S.L  wrote:
> > Erick,
> >
> > Thanks for the suggestion , I am not sure if I would be able to capture
> > what went wrong , so upgrading to 4.10 seems easier even though it means
> ,
> > a days work of effort :) . I will go ahead and upgrade and let me know ,
> > although I am surprised that this issue never got reported for 4.7 up
> until
> > now.
> >
> > Thanks again for your help!
> >
> >
> >
> > On Mon, Oct 6, 2014 at 10:52 PM, Erick Erickson  >
> > wrote:
> >
> >> I think there were some holes that would allow replicas and leaders to
> >> be out of synch that have been patched up in the last 3 releases.
> >>
> >> There shouldn't be anything you need to do to keep these in synch, so
> >> if you can capture what happened when things got out of synch we'll
> >> fix it. But a lot has changed in the last several months, so the first
> >> thing I'd do if possible is to upgrade to 4.10.1.
> >>
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Oct 6, 2014 at 2:41 PM, S.L  wrote:
> >> > Hi Erick,
> >> >
> >> > Before I tried your suggestion of  issung a commit=true update, I
> >> realized that for eaach shard there was atleast a node that had its
> index
> >> directory named like index..
> >> >
> >> > I went ahead and deleted index directory that restarted that core and
> >> now the index directory got syched with the other node and is properly
> >> named as 'index' without any timestamp attached to it.This is now
> giving me
> >> consistent results for distrib=true using a load balancer.Also
> >> distrib=false returns expexted results for a given shard.
> >> >
> >> > The underlying issue appears to be that in every shard the leader and
> >> the replica(follower) were out of sych.
> >> >
> >> > How can I avoid this from happening again?
> >> >
> >> > Thanks for your help!
> >> >
> >> > Sent from my HTC
> >> >
> >> > - Reply message -
> >> > From: "Erick Erickson" 
> >> > To: 
> >> > Subject: SolrCloud 4.7 not doing distributed search when querying
> from a
> >> load balancer.
> >> > Date: Fri, Oct 3, 2014 12:56 AM
> >> >
> >> > H. Assuming that you aren't re-indexing the doc you're searching
> >> for...
> >> >
> >> > Try issuing http://blah blah:8983/solr/collection/update?commit=true.
> >> > That'll force all the docs to be searchable. Does <1> still hold for
> >> > the document in question? Because this is exactly backwards of what
> >> > I'd expect. I'd expect, if anything, the replica (I'm trying to call
> >> > it the "follower" when a distinction needs to be made since the leader
> >> > is a "replica" too) would be out of sync. This is still a Bad
> >> > Thing, but the leader gets first crack at indexing thing.
> >> >
> >> > bq: only the replica of the shard that has this key returns the result
> >> > , and the leader does not ,
> >> >
> >> > Just to be sure we're talking about the same thing. When you say
> >> > "leader", you mean the shard leader, right? The filled-in circle on
> >> > the graph view from the admin/cloud page.
> >> >
> >> > And let's see your soft and hard commit settings please.
> >> >
> >> > Best,
> >> > Erick
> >> >
> >> > On Thu, Oct 2, 2014 at 9:48 PM, S.L 
> wrote:
> >> >> Eirck,
> >> >>
> >> >> 0> Load balancer is out of the picture
> >> >> .
> >> >> 1>When I query with *distrib=false* , I get consistent results as
> >> expected
> >> >> for those shards that dont have the key i.e I dont get the results
> back
> >> for
> >> >> those shards, however I just realized that while *distrib=false* is
> >> present
> >> >> in the query for the shard that is supposed to contain the key,only
> the
> >> >> rep

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread S.L
5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}

 
 
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
">
   1
   3
   GET_FIELDS,GET_DEBUG
   1
   {responseHeader={status=0,QTime=1,params={spellcheck=false,spellcheck.maxCollationTries=10,distrib=false,debug=[track,
track],version=2,df=suggestAggregate,shard.url=
http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/,NOW=1413398738457,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,ids=http://www.redacted.com/ip/Cutter-Bite,spellcheck.collate=true,wt=javabin,requestPurpose=GET_FIELDS,GET_DEBUG,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398738457-16,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
wordbreak],isShard=true}},response={numFound=1,start=0,docs=[SolrDocument{thingURL=
http://www.redacted.com/ip/Cutter-Bite,
id=e8995da8-7d98-4010-93b4-8ff7dffb8bfb,
_version_=1481991045188157440}]},debug={}}

 
  
   


On Tue, Oct 14, 2014 at 10:32 AM, Tim Potter 
wrote:

> Try adding shards.info=true and debug=track to your queries ... these will
> give more detailed information about what's going behind the scenes.
>
> On Mon, Oct 13, 2014 at 11:11 PM, S.L  wrote:
>
> > Erick,
> >
> > I have upgraded to SolrCloud 4.10.1 with the same toplogy , 3 shards and
> 2
> > replication factor with six cores altogether.
> >
> > Unfortunately , I still see the issue of intermittently no results being
> > returned.I am not able to figure out whats going on here, I have included
> > the logging information below.
> >
> > *Here's the query that I run.*
> >
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1/select/?q=*:*&fq=%28id:220a8dce-3b31-4d46-8386-da8405595c47%29&wt=json&distrib=true
> >
> >
> >
> > *Scenario 1: No result returned.*
> >
> > *Log Information for Scenario #1 .*
> > 92860314 [http-bio-8081-exec-103] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> > null
> > 92860315 [http-bio-8081-exec-103] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> > null
> > 92860315 [http-bio-8081-exec-103] INFO
> > org.apache.solr.handler.component.SpellCheckComponent  –
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> > null
> > 92860315 [http-bio-8081-exec-103] INFO  org.apache.solr.core.SolrCore  –
> > [dyCollection1_shard2_replica1] webapp=/solr path=/select/
> >
> >
> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
> > hits=0 status=0 QTime=5
> >
> > *Scenario #2 : I get result back*
> >
> >
> >
> > *Log information for scenario #2.*92881911 [http-bio-8081-exec-177] INFO
> > org.apache.solr.core.SolrCore  – [dyCollection1_shard2_replica1]
> > webapp=/solr path=/select
> >
> >
> params={spellcheck=true&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=
> >
> >
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&fl=productURL,score&df=suggestAggregate&start=0&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fsv=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
> > }
> > hits=1 status=0 QTime=1
>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread S.L
Look at the logging information I provided below , looks like the results
are only being returned back for this solrCloud cluster  if the request
goes to one of the two replicas of a shard.

I have verified that numDocs in the replicas for a given shard is same but
there is difference in the maxDoc and deletedDocs, does this signal the
replicas being out of sync ?

Even if the numDocs are same , how do we guarantee that those docs are
identical and have the same uniquekeys , is there a way to verify this ? I
am suspecting that  as the numDocs is same across the replicas , and still
only when the request goes to one of  the  replicas of the shard that I get
a result back , the documents with in those replicas with in a shard are
not an exact replica set of each other.

I suspect the issue I am facing in 4.10.1 cloud is related to
https://issues.apache.org/jira/browse/SOLR-4924 .

Can anyone please let me know , how to solve this issue of intermittent no
results for a query ?



On Wed, Oct 15, 2014 at 3:15 PM, S.L  wrote:

> Tim,
>
> Thanks for the suggestion.
>
> I have rerun the query by adding shards.info=true and debug= track. I
> have included the xml data for both teh scenarios below , thin happens
> intermittently on SolrCloud 4.10.1 , with a replication factor of 2 and 3
> shards (6 cores) , I get result in one execution of query and then no
> results for the subsequent one , I am hoping someone would be able to help
> me find the root cause with this additional information ,I have included
> the query output with the additional parameters for the both the scenarios
> below .
>
> Thanks for your help!
>
> *Scenario #1 : In this try I get no results back. Here is what the query
> returns.*
>
> 
> 
>
>   0
>   29
>   
>  *:*
>  true
>  true
>  track
>  xml
>  (id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb)
>   
>
>
>   http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>  0
>  0.0
>  
> http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1
>  4
>   
>   http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>  0
>  0.0
>  
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1
>  13
>   
>   http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> ">
>  0
>  0.0
>  
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1
>  26
>   
>
>
>
>   
>  false
>   
>
>
>   
>   name="rid">server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17
>  
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> ">
>1
>4
>GET_TOP_IDS
>0
> name="Response">{responseHeader={status=0,QTime=1,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398784225,shard.url=
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/,df=suggestAggregate,fl=thingURL,score,debugQuery=false,spellcheck.count=10,fq=(id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb),fsv=true,spellcheck.alternativeTermCount=5,spellcheck.maxResultsForSuggest=5,spellcheck.collateExtendedResults=true,spellcheck.extendedResults=true,spellcheck.maxCollations=5,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,rows=10,rid=server3.mydomain.com-dyCollection1_shard2_replica2-1413398784226-17,start=0,q=*:*,shards.info=true,spellcheck.dictionary=[direct,
> wordbreak],isShard=true}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={}}
> 
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> ">
>10
>13
>GET_TOP_IDS
>0
> name="Response">{responseHeader={status=0,QTime=10,params={spellcheck=true,spellcheck.maxCollationTries=10,distrib=false,debug=[false,
> track],version=2,NOW=1413398784225,shard.url=
> http://server3.mydomain.com:80

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-15 Thread S.L
Shawn,

Yes , I tried those two queries with distrib=false , I get 0 results for
first and 1 result  for the second query( (i.e. server 3 shard 2 replica
2)  consistently.

However if I run the same second query (i.e. server 3 shard 2 replica 2)
with distrib=true, I sometimes get a result and sometimes not , should'nt
this query always return a result when its pointing to a core that seems to
have that document regardless of distrib=true or false ?

Unfortunately I dont see anything particular in the logs to point to any
information.

BTW you asked me to replace the request handler , I use the select request
handler ,so I cannot replace it with anything else , is that  a problem ?

Thanks.

On Thu, Oct 16, 2014 at 12:05 AM, Shawn Heisey  wrote:

> On 10/15/2014 9:26 PM, S.L wrote:
>
>> Look at the logging information I provided below , looks like the results
>> are only being returned back for this solrCloud cluster  if the request
>> goes to one of the two replicas of a shard.
>>
>> I have verified that numDocs in the replicas for a given shard is same but
>> there is difference in the maxDoc and deletedDocs, does this signal the
>> replicas being out of sync ?
>>
>> Even if the numDocs are same , how do we guarantee that those docs are
>> identical and have the same uniquekeys , is there a way to verify this ? I
>> am suspecting that  as the numDocs is same across the replicas , and still
>> only when the request goes to one of  the  replicas of the shard that I
>> get
>> a result back , the documents with in those replicas with in a shard are
>> not an exact replica set of each other.
>>
>> I suspect the issue I am facing in 4.10.1 cloud is related to
>> https://issues.apache.org/jira/browse/SOLR-4924  .
>>
>> Can anyone please let me know , how to solve this issue of intermittent no
>> results for a query ?
>>
>
> query with no results hits these cores:
> server 2 shard 3 replica1
> server 3 shard 1 replica 1
> server 1 shard 2 replica 1
>
> query with 1 result hits these cores:
> server 2 shard 1 replica 2
> server 3 shard 2 replica 2 (found 1)
> server 1 shard 3 replica 2
>
> Here's some URLs for some testing.  They are directed at specific shard
> replicas and are specifically NOT distributed queries:
>
> http://server1.mydomain.com:8081/solr/dyCollection1_
> shard2_replica1/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-
> 8ff7dffb8bfb&distrib=false
>
> http://server3.mydomain.com:8081/solr/dyCollection1_
> shard2_replica2/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-
> 8ff7dffb8bfb&distrib=false
>
> If you run these queries (replacing server names and the /select request
> handler as appropriate), do you get 0 results on the first one and 1 result
> on the second one?  If you do, then you've definitely got replicas out of
> sync.  If you get 1 result on both queries, then something else is
> breaking.  If by chance you have taken steps to fix this particular ID,
> pick another one that you know has a problem.
>
> There is no automated way to detect replicas out of sync.  You could
> request all docs on both replicas with distrib=false&fl=id&sort=id+asc,
> then compare the two lists.  Depending on how many docs you have, those
> queries could take a while to run.
>
> If the replicas are out of sync, are there any ERROR entries in the Solr
> log, especially at the time that the problem docs were indexed?
>
> Thanks,
> Shawn
>
>


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-16 Thread S.L
Shawn,

Please find the answers to your questions.

1. Java Version :java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

2.OS
CentOS Linux release 7.0.1406 (Core)

3. Everything is 64 bit , OS , Java , and CPU.

4. Java Args.
-Djava.io.tmpdir=/opt/tomcat1/temp
-Dcatalina.home=/opt/tomcat1
-Dcatalina.base=/opt/tomcat1
-Djava.endorsed.dirs=/opt/tomcat1/endorsed
-DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
server3.mydomain.com:2181
-DzkClientTimeout=2
-DhostContext=solr
-Dport=8081
-Dhost=server1.mydomain.com
-Dsolr.solr.home=/opt/solr/home1
-Dfile.encoding=UTF8
-Duser.timezone=UTC
-XX:+UseG1GC
-XX:MaxPermSize=128m
-XX:PermSize=64m
-Xmx2048m
-Xms128m
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.util.logging.config.file=/opt/tomcat1/conf/logging.properties

5. Zookeeper ensemble has 3 zookeeper instances , which are external and
are not embedded.


6. Container : I am using Tomcat Apache Tomcat Version 7.0.42

*Additional Observations:*

I queries all docs on both replicas with distrib=false&fl=id&sort=id+asc,
then compared the two lists, I could see by eyeballing the first few lines
of ids in both the lists ,I could say that even though each list has equal
number of documents i.e 96309 each , but the document ids in them seem to
be *mutually exclusive* ,  , I did not find even a single  common id in
those lists , I tried at least 15 manually ,it looks like to me that the
replicas are disjoint sets.

Thanks.



On Thu, Oct 16, 2014 at 1:41 AM, Shawn Heisey  wrote:

> On 10/15/2014 10:24 PM, S.L wrote:
>
>> Yes , I tried those two queries with distrib=false , I get 0 results for
>> first and 1 result  for the second query( (i.e. server 3 shard 2 replica
>> 2)  consistently.
>>
>> However if I run the same second query (i.e. server 3 shard 2 replica 2)
>> with distrib=true, I sometimes get a result and sometimes not , should'nt
>> this query always return a result when its pointing to a core that seems
>> to
>> have that document regardless of distrib=true or false ?
>>
>> Unfortunately I dont see anything particular in the logs to point to any
>> information.
>>
>> BTW you asked me to replace the request handler , I use the select request
>> handler ,so I cannot replace it with anything else , is that  a problem ?
>>
>
> If you send the query with distrib=true (which is the default value in
> SolrCloud), then it treats it just as if you had sent it to
> /solr/collection instead of /solr/collection_shardN_replicaN, so it's a
> full distributed query. The distrib=false is required to turn that behavior
> off and ONLY query the index on the actual core where you sent it.
>
> I only said to replace those things as appropriate.  Since you are using
> /select, it's no problem that you left it that way. If I were to assume
> that you used /select, but you didn't, the URLs as I wrote them might not
> have worked.
>
> As discussed, this means that your replicas are truly out of sync.  It's
> difficult to know what caused it, especially if you can't see anything in
> the log when you indexed the missing documents.
>
> We know you're on Solr 4.10.1.  This means that your Java is a 1.7
> version, since Java7 is required.
>
> Here's where I ask a whole lot of questions about your setup. What is the
> precise Java version, and which vendor's Java are you using?  What
> operating system is it on?  Is everything 64-bit, or is any piece (CPU, OS,
> Java) 32-bit?  On the Solr admin UI dashboard, it lists all parameters used
> when starting Java, labelled as "Args".  Can you include those?  Is
> zookeeper external, or embedded in Solr?  Is it a 3-server (or more)
> ensemble?  Are you using the example jetty, or did you provide your own
> servlet container?
>
> We recommend 64-bit Oracle Java, the latest 1.7 version.  OpenJDK (since
> version 1.7.x) should be pretty safe as well, but IBM's Java should be
> avoided.  IBM does very aggressive runtime optimizations.  These can make
> programs run faster, but they are known to negatively affect Lucene/Solr.
>
> Thanks,
> Shawn
>
>


Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-16 Thread S.L
Shawn ,


   1. I will upgrade to 67 JVM  shortly .
   2. This is  a new collection as , I was facing a similar issue in 4.7
   and based on Erick's recommendation I updated to 4.10.1 and created a new
   collection.
   3. Yes, I am hitting the replicas of the same shard and I see the lists
   are completely non overlapping.I am using CloudSolrServer to add the
   documents.
   4. I have a 3 physical node cluster , with each having 16GB in memory.
   5. I also have a custom request handler defined in my solrconfig.xml as
   below , however I am not using that and I am only using the default select
   handler, but my MyCustomHandler class has been been added to the source and
   included in the build , but not being used for any requests yet.

  

  suggestAggregate

  direct
  
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  


5. The clusterstate.json is copied below

{"dyCollection1":{
"shards":{
  "shard1":{
"range":"8000-d554",
"state":"active",
"replicas":{
  "core_node3":{
"state":"active",
"core":"dyCollection1_shard1_replica1",
"node_name":"server3.mydomain.com:8082_solr",
"base_url":"http://server3.mydomain.com:8082/solr"},
  "core_node4":{
"state":"active",
"core":"dyCollection1_shard1_replica2",
"node_name":"server2.mydomain.com:8081_solr",
"base_url":"http://server2.mydomain.com:8081/solr";,
"leader":"true"}}},
  "shard2":{
"range":"d555-2aa9",
"state":"active",
"replicas":{
  "core_node1":{
"state":"active",
"core":"dyCollection1_shard2_replica1",
"node_name":"server1.mydomain.com:8081_solr",
"base_url":"http://server1.mydomain.com:8081/solr";,
"leader":"true"},
  "core_node6":{
"state":"active",
"core":"dyCollection1_shard2_replica2",
"node_name":"server3.mydomain.com:8081_solr",
"base_url":"http://server3.mydomain.com:8081/solr"}}},
  "shard3":{
"range":"2aaa-7fff",
"state":"active",
"replicas":{
  "core_node2":{
"state":"active",
"core":"dyCollection1_shard3_replica2",
"node_name":"server1.mydomain.com:8082_solr",
"base_url":"http://server1.mydomain.com:8082/solr";,
"leader":"true"},
  "core_node5":{
"state":"active",
"core":"dyCollection1_shard3_replica1",
"node_name":"server2.mydomain.com:8082_solr",
"base_url":"http://server2.mydomain.com:8082/solr",
"maxShardsPerNode":"1",
"router":{"name":"compositeId"},
"replicationFactor":"2",
"autoAddReplicas":"false"}}

  Thanks!

On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey  wrote:

> On 10/16/2014 6:27 PM, S.L wrote:
>
>> 1. Java Version :java version "1.7.0_51"
>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>
>
> I believe that build 51 is one of those that is known to have bugs related
> to Lucene.  If you can upgrade this to 67, that would be good, but I don't
> know that it's a pressing matter.  It looks like the Oracle JVM, which is
> good.
>
>  2.OS
>> CentOS Linux release 7.0.1406 (Core)
>>
>> 3. Everything is 64 bit , OS , Java , and CPU.
>>
>> 4. Java Args.
>>  -Djava.io.tmpdir=/opt/tomcat1/temp
>>  -Dcatalina.home=/opt/tomcat1
>>  -Dcatalina.base=/opt/tomcat1
>>  -Djava.endorsed.dirs=/opt/tomcat1/endorsed
>>  -DzkHost=server1.mydomain.com:2181,server2.mydomain.com:2181,
>> server3.mydomain.com:2181
>>  -DzkClientTimeout=2
>>  -DhostContext=solr
>>  -Dport=8081
>>  -Dhost=s

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-17 Thread S.L
Shawn,

Just wondering if you have any other suggestions on what the next steps
whould be ? Thanks.

On Thu, Oct 16, 2014 at 11:12 PM, S.L  wrote:

> Shawn ,
>
>
>1. I will upgrade to 67 JVM  shortly .
>2. This is  a new collection as , I was facing a similar issue in 4.7
>and based on Erick's recommendation I updated to 4.10.1 and created a new
>collection.
>3. Yes, I am hitting the replicas of the same shard and I see the
>lists are completely non overlapping.I am using CloudSolrServer to add the
>documents.
>4. I have a 3 physical node cluster , with each having 16GB in memory.
>5. I also have a custom request handler defined in my solrconfig.xml
>as below , however I am not using that and I am only using the default
>select handler, but my MyCustomHandler class has been been added to the
>source and included in the build , but not being used for any requests yet.
>
>startup="lazy">
> 
>   suggestAggregate
>
>   direct
>   
>   on
>   true
>   10
>   5
>   5
>   true
>   true
>   10
>   5
> 
> 
>   spellcheck
> 
>   
>
>
> 5. The clusterstate.json is copied below
>
> {"dyCollection1":{
> "shards":{
>   "shard1":{
> "range":"8000-d554",
> "state":"active",
> "replicas":{
>   "core_node3":{
> "state":"active",
> "core":"dyCollection1_shard1_replica1",
> "node_name":"server3.mydomain.com:8082_solr",
> "base_url":"http://server3.mydomain.com:8082/solr"},
>   "core_node4":{
> "state":"active",
> "core":"dyCollection1_shard1_replica2",
> "node_name":"server2.mydomain.com:8081_solr",
> "base_url":"http://server2.mydomain.com:8081/solr";,
> "leader":"true"}}},
>   "shard2":{
> "range":"d555-2aa9",
> "state":"active",
> "replicas":{
>   "core_node1":{
> "state":"active",
> "core":"dyCollection1_shard2_replica1",
> "node_name":"server1.mydomain.com:8081_solr",
> "base_url":"http://server1.mydomain.com:8081/solr";,
> "leader":"true"},
>   "core_node6":{
> "state":"active",
> "core":"dyCollection1_shard2_replica2",
> "node_name":"server3.mydomain.com:8081_solr",
> "base_url":"http://server3.mydomain.com:8081/solr"}}},
>   "shard3":{
> "range":"2aaa-7fff",
> "state":"active",
> "replicas":{
>   "core_node2":{
> "state":"active",
> "core":"dyCollection1_shard3_replica2",
> "node_name":"server1.mydomain.com:8082_solr",
> "base_url":"http://server1.mydomain.com:8082/solr";,
> "leader":"true"},
>   "core_node5":{
> "state":"active",
> "core":"dyCollection1_shard3_replica1",
> "node_name":"server2.mydomain.com:8082_solr",
> "base_url":"http://server2.mydomain.com:8082/solr",
> "maxShardsPerNode":"1",
> "router":{"name":"compositeId"},
> "replicationFactor":"2",
> "autoAddReplicas":"false"}}
>
>   Thanks!
>
> On Thu, Oct 16, 2014 at 9:02 PM, Shawn Heisey  wrote:
>
>> On 10/16/2014 6:27 PM, S.L wrote:
>>
>>> 1. Java Version :java version "1.7.0_51"
>>> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
>>> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
>>>
>>
>> I believe that build 51 is one of those that is known to have bugs
>> related to Lucene.  If you can upgrade this to 67, that would be good, but
>>

Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.

2014-10-23 Thread S.L
ing replication process
471653943 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Number of files in latest index in master: 108
471653944 [RecoveryThread] INFO
org.apache.solr.core.CachingDirectoryFactory  – return new directory for
/opt/solr/home1/dyCollection1_shard2_replica1/data/index.2014101839463
471654573 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Starting download to
NRTCachingDirectory(MMapDirectory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.2014101839463
lockFactory=NativeFSLockFactory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.2014101839463;
maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=true
471834454 [zkCallback-2-thread-12] INFO
org.apache.solr.common.cloud.ZkStateReader  – A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 6)
471897454 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
Total time taken for download : 243 secs
471898551 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  – New
index installed. Updating index properties... index=index.2014101839463
471898932 [RecoveryThread] INFO  org.apache.solr.handler.SnapPuller  –
removing old index directory
NRTCachingDirectory(MMapDirectory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index
lockFactory=NativeFSLockFactory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index;
maxCacheMB=48.0 maxMergeSizeMB=4.0)
471898932 [RecoveryThread] INFO
org.apache.solr.update.DefaultSolrCoreState  – Creating new IndexWriter...
471898934 [RecoveryThread] INFO
org.apache.solr.update.DefaultSolrCoreState  – Waiting until IndexWriter is
unused... core=dyCollection1_shard2_replica1
471898934 [RecoveryThread] INFO
org.apache.solr.update.DefaultSolrCoreState  – Rollback old IndexWriter...
core=dyCollection1_shard2_replica1
471904192 [RecoveryThread] INFO  org.apache.solr.core.SolrCore  – New index
directory detected:
old=/opt/solr/home1/dyCollection1_shard2_replica1/data/index/
new=/opt/solr/home1/dyCollection1_shard2_replica1/data/index.2014101839463
471904907 [RecoveryThread] INFO  org.apache.solr.core.SolrCore  –
SolrDeletionPolicy.onInit: commits: num=1

commit{dir=NRTCachingDirectory(MMapDirectory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.2014101839463
lockFactory=NativeFSLockFactory@/opt/solr/home1/dyCollection1_shard2_replica1/data/index.2014101839463;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_88t,generation=10685}
471904907 [RecoveryThread] INFO  org.apache.solr.core.SolrCore  – newest
commit generation = 10685

On Fri, Oct 17, 2014 at 1:12 PM, S.L  wrote:

> Shawn,
>
> Just wondering if you have any other suggestions on what the next steps
> whould be ? Thanks.
>
> On Thu, Oct 16, 2014 at 11:12 PM, S.L  wrote:
>
>> Shawn ,
>>
>>
>>1. I will upgrade to 67 JVM  shortly .
>>2. This is  a new collection as , I was facing a similar issue in 4.7
>>and based on Erick's recommendation I updated to 4.10.1 and created a new
>>collection.
>>3. Yes, I am hitting the replicas of the same shard and I see the
>>lists are completely non overlapping.I am using CloudSolrServer to add the
>>documents.
>>4. I have a 3 physical node cluster , with each having 16GB in memory.
>>5. I also have a custom request handler defined in my solrconfig.xml
>>as below , however I am not using that and I am only using the default
>>select handler, but my MyCustomHandler class has been been added to the
>>source and included in the build , but not being used for any requests 
>> yet.
>>
>>   > startup="lazy">
>> 
>>   suggestAggregate
>>
>>   direct
>>   
>>   on
>>   true
>>   10
>>   5
>>   5
>>   true
>>   true
>>   10
>>   5
>> 
>> 
>>   spellcheck
>> 
>>   
>>
>>
>> 5. The clusterstate.json is copied below
>>
>> {"dyCollection1":{
>> "shards":{
>>   "shard1":{
>> "range":"8000-d554",
>> "state":"active",
>> "replicas":{
>>   "core_node3":{
>> "state":"active",
>> "core":"dyCollection1_shard1_replica1",
>> "node_name":"server3.mydomain.com:8082_solr",
>> "base_url":"http://server3.mydomain.com:8082/solr"},
>>   "core_node4":{
>> "state":&quo

Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-26 Thread S.L
Folks,

I have posted previously about this , I am using SolrCloud 4.10.1 and have
a sharded collection with  6 nodes , 3 shards and a replication factor of 2.

I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that
can each have upto 5 threds each , so the load on the indexing side can get
to as high as 75 concurrent threads.

I am facing an issue where the replicas of a particular shard(s) are
consistently getting out of synch , initially I thought this was beccause I
was using a custom component , but I did a fresh install and removed the
custom component and reindexed using the Hadoop job , I still see the same
behavior.

I do not see any exceptions in my catalina.out , like OOM , or any other
excepitions, I suspecting thi scould be because of the multi-threaded
indexing nature of the Hadoop job . I use CloudSolrServer from my java code
to index and initialize the CloudSolrServer using a 3 node ZK ensemble.

Does any one know of any known issues with a highly multi-threaded indexing
and SolrCloud ?

Can someone help ? This issue has been slowing things down on my end for a
while now.

Thanks and much appreciated!


Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread S.L
Thank Otis,

I have checked the logs , in my case the default catalina.out and I dont
see any OOMs or , any other exceptions.

What others metrics do you suggest ?

On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> You may simply be overwhelming your cluster-nodes. Have you checked
> various metrics to see if that is the case?
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> > On Oct 26, 2014, at 9:59 PM, S.L  wrote:
> >
> > Folks,
> >
> > I have posted previously about this , I am using SolrCloud 4.10.1 and
> have
> > a sharded collection with  6 nodes , 3 shards and a replication factor
> of 2.
> >
> > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks , that
> > can each have upto 5 threds each , so the load on the indexing side can
> get
> > to as high as 75 concurrent threads.
> >
> > I am facing an issue where the replicas of a particular shard(s) are
> > consistently getting out of synch , initially I thought this was
> beccause I
> > was using a custom component , but I did a fresh install and removed the
> > custom component and reindexed using the Hadoop job , I still see the
> same
> > behavior.
> >
> > I do not see any exceptions in my catalina.out , like OOM , or any other
> > excepitions, I suspecting thi scould be because of the multi-threaded
> > indexing nature of the Hadoop job . I use CloudSolrServer from my java
> code
> > to index and initialize the CloudSolrServer using a 3 node ZK ensemble.
> >
> > Does any one know of any known issues with a highly multi-threaded
> indexing
> > and SolrCloud ?
> >
> > Can someone help ? This issue has been slowing things down on my end for
> a
> > while now.
> >
> > Thanks and much appreciated!
>


Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread S.L
Markus,

I would like to ignore it too, but whats happening is that the there is a
lot of discrepancy between the replicas , queries like
q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on which
replica the request goes to, because of huge amount of discrepancy between
the replicas.

Thank you for confirming that it is a know issue , I was thinking I was the
only one facing this due to my set up.



On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma 
wrote:

> It is an ancient issue. One of the major contributors to the issue was
> resolved some versions ago but we are still seeing it sometimes too, there
> is nothing to see in the logs. We ignore it and just reindex.
>
> -Original message-
> > From:S.L 
> > Sent: Monday 27th October 2014 16:25
> > To: solr-user@lucene.apache.org
> > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
> out of synch.
> >
> > Thank Otis,
> >
> > I have checked the logs , in my case the default catalina.out and I dont
> > see any OOMs or , any other exceptions.
> >
> > What others metrics do you suggest ?
> >
> > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > You may simply be overwhelming your cluster-nodes. Have you checked
> > > various metrics to see if that is the case?
> > >
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > > On Oct 26, 2014, at 9:59 PM, S.L  wrote:
> > > >
> > > > Folks,
> > > >
> > > > I have posted previously about this , I am using SolrCloud 4.10.1 and
> > > have
> > > > a sharded collection with  6 nodes , 3 shards and a replication
> factor
> > > of 2.
> > > >
> > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch tasks ,
> that
> > > > can each have upto 5 threds each , so the load on the indexing side
> can
> > > get
> > > > to as high as 75 concurrent threads.
> > > >
> > > > I am facing an issue where the replicas of a particular shard(s) are
> > > > consistently getting out of synch , initially I thought this was
> > > beccause I
> > > > was using a custom component , but I did a fresh install and removed
> the
> > > > custom component and reindexed using the Hadoop job , I still see the
> > > same
> > > > behavior.
> > > >
> > > > I do not see any exceptions in my catalina.out , like OOM , or any
> other
> > > > excepitions, I suspecting thi scould be because of the multi-threaded
> > > > indexing nature of the Hadoop job . I use CloudSolrServer from my
> java
> > > code
> > > > to index and initialize the CloudSolrServer using a 3 node ZK
> ensemble.
> > > >
> > > > Does any one know of any known issues with a highly multi-threaded
> > > indexing
> > > > and SolrCloud ?
> > > >
> > > > Can someone help ? This issue has been slowing things down on my end
> for
> > > a
> > > > while now.
> > > >
> > > > Thanks and much appreciated!
> > >
> >
>


Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread S.L
One is not smaller than the other, because the numDocs is same for both
"replicas" and essentially they seem to be disjoint sets.

Also manually purging the replicas is not option , because this is
"frequently" indexed index and we need everything to be automated.

What other options do I have now.

1. Turn of the replication completely in SolrCloud
2. Use traditional Master Slave replication model.
3. Introduce a "replica" aware field in the index , to figure out which
"replica" the request should go to from the client.
4. Try a distribution like Helios to see if it has any different behavior.

Just think out loud here ..

On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma 
wrote:

> Hi - if there is a very large discrepancy, you could consider to purge the
> smallest replica, it will then resync from the leader.
>
>
> -Original message-
> > From:S.L 
> > Sent: Monday 27th October 2014 16:41
> > To: solr-user@lucene.apache.org
> > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
> out of synch.
> >
> > Markus,
> >
> > I would like to ignore it too, but whats happening is that the there is a
> > lot of discrepancy between the replicas , queries like
> > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on
> which
> > replica the request goes to, because of huge amount of discrepancy
> between
> > the replicas.
> >
> > Thank you for confirming that it is a know issue , I was thinking I was
> the
> > only one facing this due to my set up.
> >
> > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > It is an ancient issue. One of the major contributors to the issue was
> > > resolved some versions ago but we are still seeing it sometimes too,
> there
> > > is nothing to see in the logs. We ignore it and just reindex.
> > >
> > > -Original message-
> > > > From:S.L 
> > > > Sent: Monday 27th October 2014 16:25
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> replicas
> > > out of synch.
> > > >
> > > > Thank Otis,
> > > >
> > > > I have checked the logs , in my case the default catalina.out and I
> dont
> > > > see any OOMs or , any other exceptions.
> > > >
> > > > What others metrics do you suggest ?
> > > >
> > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic <
> > > > otis.gospodne...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > You may simply be overwhelming your cluster-nodes. Have you checked
> > > > > various metrics to see if that is the case?
> > > > >
> > > > > Otis
> > > > > --
> > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > >
> > > > >
> > > > >
> > > > > > On Oct 26, 2014, at 9:59 PM, S.L 
> wrote:
> > > > > >
> > > > > > Folks,
> > > > > >
> > > > > > I have posted previously about this , I am using SolrCloud
> 4.10.1 and
> > > > > have
> > > > > > a sharded collection with  6 nodes , 3 shards and a replication
> > > factor
> > > > > of 2.
> > > > > >
> > > > > > I am indexing Solr using a Hadoop job , I have 15 Map fetch
> tasks ,
> > > that
> > > > > > can each have upto 5 threds each , so the load on the indexing
> side
> > > can
> > > > > get
> > > > > > to as high as 75 concurrent threads.
> > > > > >
> > > > > > I am facing an issue where the replicas of a particular shard(s)
> are
> > > > > > consistently getting out of synch , initially I thought this was
> > > > > beccause I
> > > > > > was using a custom component , but I did a fresh install and
> removed
> > > the
> > > > > > custom component and reindexed using the Hadoop job , I still
> see the
> > > > > same
> > > > > > behavior.
> > > > > >
> > > > > > I do not see any exceptions in my catalina.out , like OOM , or
> any
> > > other
> > > > > > excepitions, I suspecting thi scould be because of the
> multi-threaded
> > > > > > indexing nature of the Hadoop job . I use CloudSolrServer from my
> > > java
> > > > > code
> > > > > > to index and initialize the CloudSolrServer using a 3 node ZK
> > > ensemble.
> > > > > >
> > > > > > Does any one know of any known issues with a highly
> multi-threaded
> > > > > indexing
> > > > > > and SolrCloud ?
> > > > > >
> > > > > > Can someone help ? This issue has been slowing things down on my
> end
> > > for
> > > > > a
> > > > > > while now.
> > > > > >
> > > > > > Thanks and much appreciated!
> > > > >
> > > >
> > >
>


Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread S.L
Please find the clusterstate.json attached.

Also in this case *atleast *the Shard1 replicas are out of sync , as can be
seen below.


*Shard 1 replica 1 *does not* return a result with distrib=false.*
*Query :*
http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true

*Result :*

01*:*truefalsetrackxml(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)


*Shard1 replica 2 *does* return the result with distrib=false.*
*Query:*
http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true

*Result:*

01*:*truefalsetrackxml(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5)
http://www.xyz.com9f4748c0-fe16-4632-b74e-4fee6b80cbf51483135330558148608

On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Mon, Oct 27, 2014 at 9:40 PM, S.L  wrote:
>
> > One is not smaller than the other, because the numDocs is same for both
> > "replicas" and essentially they seem to be disjoint sets.
> >
>
> That is strange. Can we see your clusterstate.json? With that, please also
> specify the two replicas which are out of sync.
>
> >
> > Also manually purging the replicas is not option , because this is
> > "frequently" indexed index and we need everything to be automated.
> >
> > What other options do I have now.
> >
> > 1. Turn of the replication completely in SolrCloud
> > 2. Use traditional Master Slave replication model.
> > 3. Introduce a "replica" aware field in the index , to figure out which
> > "replica" the request should go to from the client.
> > 4. Try a distribution like Helios to see if it has any different
> behavior.
> >
> > Just think out loud here ..
> >
> > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsma <
> > markus.jel...@openindex.io>
> > wrote:
> >
> > > Hi - if there is a very large discrepancy, you could consider to purge
> > the
> > > smallest replica, it will then resync from the leader.
> > >
> > >
> > > -Original message-
> > > > From:S.L 
> > > > Sent: Monday 27th October 2014 16:41
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> > replicas
> > > out of synch.
> > > >
> > > > Markus,
> > > >
> > > > I would like to ignore it too, but whats happening is that the there
> > is a
> > > > lot of discrepancy between the replicas , queries like
> > > > q=*:*&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47) fail depending on
> > > which
> > > > replica the request goes to, because of huge amount of discrepancy
> > > between
> > > > the replicas.
> > > >
> > > > Thank you for confirming that it is a know issue , I was thinking I
> was
> > > the
> > > > only one facing this due to my set up.
> > > >
> > > > On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma <
> > > markus.jel...@openindex.io>
> > > > wrote:
> > > >
> > > > > It is an ancient issue. One of the major contributors to the issue
> > was
> > > > > resolved some versions ago but we are still seeing it sometimes
> too,
> > > there
> > > > > is nothing to see in the logs. We ignore it and just reindex.
> > > > >
> > > > > -Original message-
> > > > > > From:S.L 
> > > > > > Sent: Monday 27th October 2014 16:25
> > > > > > To: solr-user@lucene.apache.org
> > > > > > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> > > replicas
> > > > > out of synch.
> > > > > >
> > > > > > Thank Otis,
> > > > > >
> > > > > > I have checked the logs , in my case the default catalina.out
> and I
> > > dont
> > > > > > see any OOMs or , any other exceptions.
> > > > > >
> > > > > > What others metrics do you suggest ?
> > > > > >
> > > > > > On Mon, Oct 27, 2014 at 9:26 AM, Otis Gospodnetic <
> > > > > > otis.gospodne...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > You may simp

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread S.L
Good point about ZK logs , I do see the following exceptions intermittently
in the ZK log.

2014-10-27 06:54:14,621 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029
2014-10-27 07:00:06,697 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /xxx.xxx.xxx.xxx:37336
2014-10-27 07:00:06,725 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish
new session at /xxx.xxx.xxx.xxx:37336
2014-10-27 07:00:06,746 [myid:1] - INFO
[CommitProcessor:1:ZooKeeperServer@617] - Established session
0x14949db9da40037 with negotiated timeout 1 for client
/xxx.xxx.xxx.xxx:37336
2014-10-27 07:01:06,520 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x14949db9da40037, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:744)

For queuing theory , I dont know of any way to see how fasts the requests
are being served by SolrCloud , and if a queue is being maintained if the
service rate is slower than the rate of requests from the incoming multiple
threads.

On Mon, Oct 27, 2014 at 7:09 PM, Will Martin  wrote:

> 2 naïve comments, of course.
>
>
>
> -  Queuing theory
>
> -  Zookeeper logs.
>
>
>
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Monday, October 27, 2014 1:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
> out of synch.
>
>
>
> Please find the clusterstate.json attached.
>
> Also in this case atleast the Shard1 replicas are out of sync , as can be
> seen below.
>
> Shard 1 replica 1 *does not* return a result with distrib=false.
>
> Query :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <
> http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true>
> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&
> shards.info=true
>
>
>
> Result :
>
> 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5) name="response" numFound="0" start="0"/>
>
>
>
> Shard1 replica 2 *does* return the result with distrib=false.
>
> Query: http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <
> http://server2.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&shards.info=true>
> &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debug=track&
> shards.info=true
>
> Result:
>
> 0 name="QTime">1*:*truefalse name="debug">trackxml name="fq">(id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5) name="response" numFound="1" start="0">
> http://www.xyz.com name="id">9f4748c0-fe16-4632-b74e-4fee6b80cbf5 name="_version_">1483135330558148608 name="debug"/>
>
>
>
> On Mon, Oct 27, 2014 at 12:19 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> On Mon, Oct 27, 2014 at 9:40 PM, S.L  wrote:
>
> > One is not smaller than the other, because the numDocs is same for both
> > "replicas" and essentially they seem to be disjoint sets.
> >
>
> That is strange. Can we see your clusterstate.json? With that, please also
> specify the two replicas which are out of sync.
>
> >
> > Also manually purging the replicas is not option , because this is
> > "frequently" indexed index and we need everything to be automated.
> >
> > What other options do I have now.
> >
> > 1. Turn of the replication completely in SolrCloud
> > 2. Use traditional Master Slave replication model.
> > 3. Introduce a "replica" aware field in the index , to figure out which
> > "replica" the request should go to from the client.
> > 4. Try a distribution like Helios to see if it has any different
> behavior.
> >
> > Just think out loud here ..
> >
> > On Mon, Oct 27, 2014 at 11:56 AM, Markus Jelsm

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread S.L
Will,

I think in one of your other emails(which I am not able to find) you has
asked if I was indexing directly from MapReduce jobs, yes I am indexing
directly from the map task and that is done using SolrJ with a
SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use
something like MapReducerIndexerTool , which I suupose writes to HDFS and
that is in a subsequent step moved to Solr index ? If so why ?

I dont use any softCommits and do autocommit every 15 seconds , the snippet
in the configuration can be seen below.

 
   ${solr.
autoSoftCommit.maxTime:-1}
 

 
   ${solr.autoCommit.maxTime:15000}

   true
 

I looked at the localhost_access.log file ,  all the GET and POST requests
have a sub-second response time.




On Tue, Oct 28, 2014 at 2:06 AM, Will Martin  wrote:

> The easiest, and coarsest measure of response time [not service time in a
> distributed system] can be picked up in your localhost_access.log file.
> You're using tomcat write?  Lookup AccessLogValve in the docs and
> server.xml. You can add configuration to report the payload and time to
> service the request without touching any code.
>
> Queueing theory is what Otis was talking about when he said you've
> saturated your environment. In AWS people just auto-scale up and don't
> worry about where the load comes from; its dumb if it happens more than 2
> times. Capacity planning is tough, let's hope it doesn't disappear
> altogether.
>
> G'luck
>
>
> -Original Message-
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Monday, October 27, 2014 9:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
> out of synch.
>
> Good point about ZK logs , I do see the following exceptions
> intermittently in the ZK log.
>
> 2014-10-27 06:54:14,621 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029
> 2014-10-27 07:00:06,697 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
> connection from /xxx.xxx.xxx.xxx:37336
> 2014-10-27 07:00:06,725 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to
> establish new session at /xxx.xxx.xxx.xxx:37336
> 2014-10-27 07:00:06,746 [myid:1] - INFO
> [CommitProcessor:1:ZooKeeperServer@617] - Established session
> 0x14949db9da40037 with negotiated timeout 1 for client
> /xxx.xxx.xxx.xxx:37336
> 2014-10-27 07:01:06,520 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x14949db9da40037, likely client has closed socket
> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:744)
>
> For queuing theory , I dont know of any way to see how fasts the requests
> are being served by SolrCloud , and if a queue is being maintained if the
> service rate is slower than the rate of requests from the incoming multiple
> threads.
>
> On Mon, Oct 27, 2014 at 7:09 PM, Will Martin  wrote:
>
> > 2 naïve comments, of course.
> >
> >
> >
> > -  Queuing theory
> >
> > -  Zookeeper logs.
> >
> >
> >
> > From: S.L [mailto:simpleliving...@gmail.com]
> > Sent: Monday, October 27, 2014 1:42 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1
> > replicas out of synch.
> >
> >
> >
> > Please find the clusterstate.json attached.
> >
> > Also in this case atleast the Shard1 replicas are out of sync , as can
> > be seen below.
> >
> > Shard 1 replica 1 *does not* return a result with distrib=false.
> >
> > Query
> > :http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:* <
> > http://server3.mydomain.com:8082/solr/dyCollection1/select/?q=*:*&fq=%
> > 28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false&debu
> > g=track&shards.info=true>
> > &fq=%28id:9f4748c0-fe16-4632-b74e-4fee6b80cbf5%29&wt=xml&distrib=false
> > &debug=track&
> > shards.info=true
> >
> >
> >
> > Result :
> >
> > 0 > name="QTime">1*:*truefalse > name="debug">trackxml > name="fq">(id:9f4748c0-fe1

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread S.L
I m using Apache Hadoop and Solr , do I nee dto switch to Cloudera

On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> We index directly from mappers using SolrJ. It does work, but you pay the
> price of having to instantiate all those sockets vs. the way
> MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer
> directly in the Reduce task.
>
> You don't *need* to use MapReduceIndexerTool, but it's more efficient, and
> if you don't, you then have to make sure to appropriately tune your Hadoop
> implementation to match what your Solr installation is capable of.
>
> On 10/28/14 12:39, S.L wrote:
>
>> Will,
>>
>> I think in one of your other emails(which I am not able to find) you has
>> asked if I was indexing directly from MapReduce jobs, yes I am indexing
>> directly from the map task and that is done using SolrJ with a
>> SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use
>> something like MapReducerIndexerTool , which I suupose writes to HDFS and
>> that is in a subsequent step moved to Solr index ? If so why ?
>>
>> I dont use any softCommits and do autocommit every 15 seconds , the
>> snippet
>> in the configuration can be seen below.
>>
>>   
>> ${solr.
>> autoSoftCommit.maxTime:-1}
>>   
>>
>>   
>> ${solr.autoCommit.maxTime:15000}
>>
>> true
>>   
>>
>> I looked at the localhost_access.log file ,  all the GET and POST requests
>> have a sub-second response time.
>>
>>
>>
>>
>> On Tue, Oct 28, 2014 at 2:06 AM, Will Martin 
>> wrote:
>>
>>  The easiest, and coarsest measure of response time [not service time in a
>>> distributed system] can be picked up in your localhost_access.log file.
>>> You're using tomcat write?  Lookup AccessLogValve in the docs and
>>> server.xml. You can add configuration to report the payload and time to
>>> service the request without touching any code.
>>>
>>> Queueing theory is what Otis was talking about when he said you've
>>> saturated your environment. In AWS people just auto-scale up and don't
>>> worry about where the load comes from; its dumb if it happens more than 2
>>> times. Capacity planning is tough, let's hope it doesn't disappear
>>> altogether.
>>>
>>> G'luck
>>>
>>>
>>> -Original Message-
>>> From: S.L [mailto:simpleliving...@gmail.com]
>>> Sent: Monday, October 27, 2014 9:25 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
>>> out of synch.
>>>
>>> Good point about ZK logs , I do see the following exceptions
>>> intermittently in the ZK log.
>>>
>>> 2014-10-27 06:54:14,621 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
>>> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029
>>> 2014-10-27 07:00:06,697 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
>>> connection from /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:00:06,725 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to
>>> establish new session at /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:00:06,746 [myid:1] - INFO
>>> [CommitProcessor:1:ZooKeeperServer@617] - Established session
>>> 0x14949db9da40037 with negotiated timeout 1 for client
>>> /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:01:06,520 [myid:1] - WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
>>> EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> 0x14949db9da40037, likely client has closed socket
>>>  at
>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>>>  at
>>>
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(
>>> NIOServerCnxnFactory.java:208)
>>>  at java.lang.Thread.run(Thread.java:744)
>>>
>>> For queuing theory , I dont know of any way to see how fasts the requests
>>> are being served by SolrCloud , and if a queue is being maintained if the
>>> service rate is slower than the rate of requests from the incoming
>>> multiple
&g

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread S.L
Yeah , I get that not using a MarReduceIndexerTool could be more resource
intensive , but the way this issue  is manifesting which is resulting in
disjoint SolrCloud replicas perplexes me .

While you were tuning your SolrCloud environment to cater to the Hadoop
indexing requirements , did you ever face the issue of disjoint replicas?

Is MapReduceIndexer tool Cloudera distro specific? I am using Apache Solr
and Hadoop.

Thanks



On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> We index directly from mappers using SolrJ. It does work, but you pay the
> price of having to instantiate all those sockets vs. the way
> MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer
> directly in the Reduce task.
>
> You don't *need* to use MapReduceIndexerTool, but it's more efficient, and
> if you don't, you then have to make sure to appropriately tune your Hadoop
> implementation to match what your Solr installation is capable of.
>
> On 10/28/14 12:39, S.L wrote:
>
>> Will,
>>
>> I think in one of your other emails(which I am not able to find) you has
>> asked if I was indexing directly from MapReduce jobs, yes I am indexing
>> directly from the map task and that is done using SolrJ with a
>> SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use
>> something like MapReducerIndexerTool , which I suupose writes to HDFS and
>> that is in a subsequent step moved to Solr index ? If so why ?
>>
>> I dont use any softCommits and do autocommit every 15 seconds , the
>> snippet
>> in the configuration can be seen below.
>>
>>   
>> ${solr.
>> autoSoftCommit.maxTime:-1}
>>   
>>
>>   
>> ${solr.autoCommit.maxTime:15000}
>>
>> true
>>   
>>
>> I looked at the localhost_access.log file ,  all the GET and POST requests
>> have a sub-second response time.
>>
>>
>>
>>
>> On Tue, Oct 28, 2014 at 2:06 AM, Will Martin 
>> wrote:
>>
>>  The easiest, and coarsest measure of response time [not service time in a
>>> distributed system] can be picked up in your localhost_access.log file.
>>> You're using tomcat write?  Lookup AccessLogValve in the docs and
>>> server.xml. You can add configuration to report the payload and time to
>>> service the request without touching any code.
>>>
>>> Queueing theory is what Otis was talking about when he said you've
>>> saturated your environment. In AWS people just auto-scale up and don't
>>> worry about where the load comes from; its dumb if it happens more than 2
>>> times. Capacity planning is tough, let's hope it doesn't disappear
>>> altogether.
>>>
>>> G'luck
>>>
>>>
>>> -Original Message-
>>> From: S.L [mailto:simpleliving...@gmail.com]
>>> Sent: Monday, October 27, 2014 9:25 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas
>>> out of synch.
>>>
>>> Good point about ZK logs , I do see the following exceptions
>>> intermittently in the ZK log.
>>>
>>> 2014-10-27 06:54:14,621 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
>>> client /xxx.xxx.xxx.xxx:56877 which had sessionid 0x34949dbad580029
>>> 2014-10-27 07:00:06,697 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
>>> connection from /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:00:06,725 [myid:1] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to
>>> establish new session at /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:00:06,746 [myid:1] - INFO
>>> [CommitProcessor:1:ZooKeeperServer@617] - Established session
>>> 0x14949db9da40037 with negotiated timeout 1 for client
>>> /xxx.xxx.xxx.xxx:37336
>>> 2014-10-27 07:01:06,520 [myid:1] - WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
>>> EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> 0x14949db9da40037, likely client has closed socket
>>>  at
>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>>>  at
>>>
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(
>>> NIOServerCnxnFact

Re: Missing Records

2014-10-30 Thread S.L
I am curious , how many shards do you have and whats the replication factor
you are using ?

On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke  wrote:

> Hi All,
>
> We have a SOLR cloud instance that has been humming along nicely for
> months.
> Last week we started experiencing missing records.
>
> Admin DIH Example:
> Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s)
> A *:* search claims that there are only 903,902 this is the first full
> index.
> Subsequent full indexes give the following counts for the *:* search
> 903,805
> 903,665
> 826,357
>
> All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 0,
> Processed: 903,993 (x/s) every time. ---records per second is variable
>
>
> I found an item that should be in the index but is not found in a search.
>
> Here are the referenced lines of the log file.
>
> DEBUG - 2014-10-30 15:10:51.160;
> org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE
> add{,id=750041421}
> {{params(debug=false&optimize=true&indent=true&commit=true&clean=true&wt=json&command=full-import&entity=ads&verbose=false),defaults(config=data-config.xml)}}
> DEBUG - 2014-10-30 15:10:51.160;
> org.apache.solr.update.SolrCmdDistributor; sending update to
> http://192.168.20.57:7574/solr/inventory_shard1_replica2/ retry:0
> add{,id=750041421}
> params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica1%2F
>
> --- there are 746 lines of log between entries ---
>
> DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire;  >>
> "[0x2][0xc3][0xe0]¶ms[0xa2][0xe0].update.distrib(TOLEADER[0xe0],distrib.from?[0x17]
> http://192.168.20.57:8983/solr/inventory_shard1_replica1/[0xe0]&delByQ[0x0][0xe0]'docsMap[0xe][0x13][0x10]8[0x8]?[0x80][0x0][0x0][0xe0]#Zip%51106[0xe0]-IsReelCentric[0x2][0xe0](HasPrice[0x1][0xe0]*Make_Lower'ski-doo[0xe0])StateName$Iowa[0xe0]-OriginalModel/Summit
> Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp@[0xe0]#lat(42.48929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0]2DivisionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]*PhotoCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine
> [0xe0]*ClassFacet.12|Snowmobiles[0xe0]$Make'Ski-Doo[0xe0]$City*Sioux
> City[0xe0]#lng*-96.369302[0xe0]-Certification!N[0xe0]0EmotionalTagline0162"
> Long Track
> [0xe0]*IsEnhanced[0x1][0xe0]*SubClassID$1704[0xe0](NetPrice$4500[0xe0]1IsInternetSpecial[0x2][0xe0](HasPhoto[0x1][0xe0]/DealerSortOrder!2[0xe0]+Description?VThis
> Bad boy will pull you through the deepest snow!With the 162" track and
> 1000cc of power you can fly up any
> hill!![0xe0],DealerRadius+8046.72[0xe0],Transmission
> [0xe0]*ModelFacet7Ski-Doo|Summit Highmark[0xe0]/DealerNameFacet9Certified
> Auto,
> Inc.|4150[0xe0])StateAbbr"IA[0xe0])ClassName+Snowmobiles[0xe0](DealerID$4150[0xe0]&AdCode$DX1Q[0xe0]*DealerName4Certified
> Auto,
> Inc.[0xe0])Condition$Used[0xe0]/Condition_Lower$used[0xe0]-ExteriorColor+Blue/Yellow[0xe0],DivisionName,Recreational[0xe0]$Trim(1000
> SDI[0xe0](SourceID!1[0xe0]0HasAdEnhancement!0[0xe0]'ClassID"12[0xe0].FuelType_Lower%other[0xe0]$Year$2005[0xe0]+DealerFacet?[0x8]4150|Certified
> Auto, Inc.|Sioux City|IA[0xe0],SubClassName+Snowmobiles[0xe0]%Model/Summit
> Highmark[0xe0])EntryDate42011-11-17T10:46:00Z[0xe0]+StockNumber&000105[0xe0]+PriceRebate!0[0xe0]+Model_Lower/summit
> highmark[\n]"
> What could be the issue and how does one fix this issue?
>
> Thanks so much and if more information is needed I have preserved the log
> files.
>
> AJ
>


Master Slave set up in Solr Cloud

2014-10-30 Thread S.L
Hi All,

As I previously reported due to no overlap in terms of the documets in the
SolrCloud replicas of the index shards , I have turned off the replication
and basically have there shards with a replication factor of 1.

It obviously seems will not be scalable due to the fact that the same core
will be indexed and queried at the same time as this is a long running
indexing task.

My questions is what options do I have to set up the replicas of the single
per shard core outside of the SolrCloud replication factor mechanism
because that does not seem to work for me ?


Thanks.


Re: Master Slave set up in Solr Cloud

2014-11-02 Thread S.L
Resending this  as I might have not been clear in my earlier query.

I want to use SolrCloud for everything except the replication , is it
possible to set up the master-slave configuration using different Solr
instances and still be able to use the sharding feature provided by
SolrCloud ?

On Thu, Oct 30, 2014 at 6:18 PM, S.L  wrote:
> Hi All,
>
> As I previously reported due to no overlap in terms of the documets in the
> SolrCloud replicas of the index shards , I have turned off the replication
> and basically have there shards with a replication factor of 1.
>
> It obviously seems will not be scalable due to the fact that the same core
> will be indexed and queried at the same time as this is a long running
> indexing task.
>
> My questions is what options do I have to set up the replicas of the single
> per shard core outside of the SolrCloud replication factor mechanism because
> that does not seem to work for me ?
>
>
> Thanks.
>


Different ids for the same document in different replicas.

2014-11-11 Thread S.L
Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?




Thanks.


Re: Different ids for the same document in different replicas.

2014-11-12 Thread S.L
Thanks.

So the issue here is I already have a doctorId
defined in my schema.xml.

If along with that I also want the  field to be automatically
generated for each document do I have to declare it as a  as
well , because I just tried the following setting without the uniqueKey for
id and its only generating blank ids for me.

*schema.xml*



*solrconfig.xml*

  


id





On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
garthgr...@averyranchconsulting.com> wrote:

> Looking a little deeper, I did find this about UUIDField
>
>
> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>
> "NOTE: Configuring a UUIDField instance with a default value of "NEW" is
> not advisable for most users when using SolrCloud (and not possible if the
> UUID value is configured as the unique key field) since the result will be
> that each replica of each document will get a unique UUID value. Using
> UUIDUpdateProcessorFactory<
> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html>
> to generate UUID values when documents are added is recomended instead.”
>
> That might describe the behavior you saw.  And the use of
> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
> here:
>
>
> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>
> Though I’ve not actually tried that process before.
>
> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
> garthgr...@averyranchconsulting.com garthgr...@averyranchconsulting.com>> wrote:
>
> “uuid” isn’t an out of the box field type that I’m familiar with.
>
> Generally, I’d stick with the out of the box advice of the schema.xml
> file, which includes things like….
>
>   
>required="true" multiValued="false" />
>
> and…
>
> 
> id
>
> If you’re creating some key/value pair with uuid as the key as you feed
> documents in, and you know that the uuid values you’re creating are unique,
> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
> change the key name you send in from ‘uuid’ to ‘id’.
>
> On Nov 11, 2014, at 7:18 PM, S.L  simpleliving...@gmail.com>> wrote:
>
> Hi All,
>
> I am seeing interesting behavior on the replicas , I have a single
> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
> number of documents ~375 that are replicated across the six replicas .
>
> The interesting thing is that the same  document has a different id in
> each one of those replicas .
>
> This is causing the fq(id:xyz) type queries to fail, depending on
> which replica the query goes to.
>
> I have  specified the id field in the following manner in schema.xml,
> is it the right way to specifiy an auto generated id in  SolrCloud ?
>
>  required="true" multiValued="false" />
>
>
> Thanks.
>
>
>


Re: Different ids for the same document in different replicas.

2014-11-12 Thread S.L
Just tried  adding  id while keeping id type=
"string" only blank ids are being generated ,looks like the id is being
auto generated only if the the id is set to  type uuid , but in case of
SolrCloud this id will be unique per replica.

Is there a  way to generate a unique id both in case of SolrCloud with out
using the uuid type or not having a per replica unique id?

The uuid in question is of type .




On Wed, Nov 12, 2014 at 6:20 PM, S.L  wrote:

> Thanks.
>
> So the issue here is I already have a doctorId
> defined in my schema.xml.
>
> If along with that I also want the  field to be automatically
> generated for each document do I have to declare it as a  as
> well , because I just tried the following setting without the uniqueKey for
> id and its only generating blank ids for me.
>
> *schema.xml*
>
>  required="true" multiValued="false" />
>
> *solrconfig.xml*
>
>   
>
> 
> id
> 
> 
> 
>
>
> On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm <
> garthgr...@averyranchconsulting.com> wrote:
>
>> Looking a little deeper, I did find this about UUIDField
>>
>>
>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
>>
>> "NOTE: Configuring a UUIDField instance with a default value of "NEW" is
>> not advisable for most users when using SolrCloud (and not possible if the
>> UUID value is configured as the unique key field) since the result will be
>> that each replica of each document will get a unique UUID value. Using
>> UUIDUpdateProcessorFactory<
>> http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html>
>> to generate UUID values when documents are added is recomended instead.”
>>
>> That might describe the behavior you saw.  And the use of
>> UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
>> here:
>>
>>
>> http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
>>
>> Though I’ve not actually tried that process before.
>>
>> On Nov 11, 2014, at 7:39 PM, Garth Grimm <
>> garthgr...@averyranchconsulting.com> garthgr...@averyranchconsulting.com>> wrote:
>>
>> “uuid” isn’t an out of the box field type that I’m familiar with.
>>
>> Generally, I’d stick with the out of the box advice of the schema.xml
>> file, which includes things like….
>>
>>   
>>   > required="true" multiValued="false" />
>>
>> and…
>>
>> 
>> id
>>
>> If you’re creating some key/value pair with uuid as the key as you feed
>> documents in, and you know that the uuid values you’re creating are unique,
>> just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
>> change the key name you send in from ‘uuid’ to ‘id’.
>>
>> On Nov 11, 2014, at 7:18 PM, S.L > simpleliving...@gmail.com>> wrote:
>>
>> Hi All,
>>
>> I am seeing interesting behavior on the replicas , I have a single
>> shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
>> number of documents ~375 that are replicated across the six replicas .
>>
>> The interesting thing is that the same  document has a different id in
>> each one of those replicas .
>>
>> This is causing the fq(id:xyz) type queries to fail, depending on
>> which replica the query goes to.
>>
>> I have  specified the id field in the following manner in schema.xml,
>> is it the right way to specifiy an auto generated id in  SolrCloud ?
>>
>>   >   required="true" multiValued="false" />
>>
>>
>> Thanks.
>>
>>
>>
>


Can we query on _version_field ?

2014-11-12 Thread S.L
Hi All,

We know that _version_field is a mandatory field in solrcloud schema.xml,
it is expected to be of type long , it also seems to have unique value in a
collection.

However the query of the form
http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:148463254894438%29&wt=json
does not seems to return any record , can we query on the _version_field in
the schema.xml ?

Thank you.


Re: Can we query on _version_field ?

2014-11-13 Thread S.L
Here is why I want to do this .

1. My unique key is a http URL, doctorURL.
2. If I do a look up based on URL , I am bound to face issues with
character escaping and all.
3. To avoid that I was using a UUID for look up , but in SolrCloud it
generates unique per replica , which is not acceptable.
4. Now I see that the mandatory _version_ field has a unique value per
document and and not unique per replica , so I am exploring the use of
_version_ to do a look up only and not neccesarily use it as a unique key,
is it do able in that case ?

On Thu, Nov 13, 2014 at 8:58 AM, Erick Erickson 
wrote:

> Really, I have to ask why you would want to. This is really purely an
> internal
> thing. I don't know what practical value there would be to search on this?
>
> Interestingly, I can search _version_:[100 TO *], but specific searches
> seem to fail.
>
> I wonder if there's something wonky going on with searching on large longs
> here.
>
> Feels like an XY problem to me though.
>
> Best,
> Erick
>
> On Thu, Nov 13, 2014 at 12:45 AM, S.L  wrote:
> > Hi All,
> >
> > We know that _version_field is a mandatory field in solrcloud schema.xml,
> > it is expected to be of type long , it also seems to have unique value
> in a
> > collection.
> >
> > However the query of the form
> >
> http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:148463254894438%29&wt=json
> > does not seems to return any record , can we query on the _version_field
> in
> > the schema.xml ?
> >
> > Thank you.
>


Re: Can we query on _version_field ?

2014-11-13 Thread S.L
Erick,

1."_version_ will change on updates" , shouldnt that be OK  ?My
understanding of update here means that the a new document will be inserted
with the same unique key  in my case ,which will replace the
document effectively. This will not be an issue in my case because the
initial search results based on , would have basic doctor data
, and when that tile is  clicked upon detail data would be displayed based
on the lookup of the _version_ id. So if the _version_ does not change
besides the "update"  , I should be good , of course there is a possibility
of the document being "updated" between the search results being displayed
and detailed information being requested, but the possibility of that less
in my case , because usually people request details as soon as the initial
search results are displayed.


2. Yes,I have used UUIDUPdateProcessorFactory  in the following ways , but
none of them solve the issue , especially in SolrCloud.

*Case 1:*

*schema.xml*



This does not generate the unique id at all.

*Case 2:*



In this case a unique id is generated , but that is unique for every
replica and we end up with different ids for the same document in different
replicas.


In both the cases above the solrconfig.xml had the following entry.

  


id






On Thu, Nov 13, 2014 at 11:01 AM, Erick Erickson 
wrote:

> _version_ will change on updates I'm pretty sure, so I doubt
> it's suitable.
>
> I _think_ you can use a UUIDUPdateProcessorFactory here.
> I haven't checked this personally, but the idea here is
> that the UUID cannot be assigned on the shard. But if you're
> checking this out, if the UUID is assigned _before_ the doc
> is sent to the destination shard, it should be fine.
>
> Have you checked that out? I'm at a conference, so I can't
> check it out too thoroughly right now...
>
> Best,
> Erick
>
> On Thu, Nov 13, 2014 at 10:18 AM, S.L  wrote:
> > Here is why I want to do this .
> >
> > 1. My unique key is a http URL, doctorURL.
> > 2. If I do a look up based on URL , I am bound to face issues with
> > character escaping and all.
> > 3. To avoid that I was using a UUID for look up , but in SolrCloud it
> > generates unique per replica , which is not acceptable.
> > 4. Now I see that the mandatory _version_ field has a unique value per
> > document and and not unique per replica , so I am exploring the use of
> > _version_ to do a look up only and not neccesarily use it as a unique
> key,
> > is it do able in that case ?
> >
> > On Thu, Nov 13, 2014 at 8:58 AM, Erick Erickson  >
> > wrote:
> >
> >> Really, I have to ask why you would want to. This is really purely an
> >> internal
> >> thing. I don't know what practical value there would be to search on
> this?
> >>
> >> Interestingly, I can search _version_:[100 TO *], but specific
> searches
> >> seem to fail.
> >>
> >> I wonder if there's something wonky going on with searching on large
> longs
> >> here.
> >>
> >> Feels like an XY problem to me though.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Nov 13, 2014 at 12:45 AM, S.L 
> wrote:
> >> > Hi All,
> >> >
> >> > We know that _version_field is a mandatory field in solrcloud
> schema.xml,
> >> > it is expected to be of type long , it also seems to have unique value
> >> in a
> >> > collection.
> >> >
> >> > However the query of the form
> >> >
> >>
> http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:148463254894438%29&wt=json
> >> > does not seems to return any record , can we query on the
> _version_field
> >> in
> >> > the schema.xml ?
> >> >
> >> > Thank you.
> >>
>


Re: Can we query on _version_field ?

2014-11-13 Thread S.L
I am not sure if this a case of XY problem.

I have no control over the URLs to deduce an id from them , those are from
www, I made the URL the uniqueKey , that way the document gets replaced
when a new document with that URL comes in .

To do the detail look up I can either use the same  as it is , or
try and generate a unique id filed for each document.

For the later option UUID is not behaving as expected in SolrCloud and
_version_ field seems to be serving the need .

On Thu, Nov 13, 2014 at 11:35 AM, Shawn Heisey  wrote:

> On 11/12/2014 10:45 PM, S.L wrote:
> > We know that _version_field is a mandatory field in solrcloud schema.xml,
> > it is expected to be of type long , it also seems to have unique value
> in a
> > collection.
> >
> > However the query of the form
> >
> http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:148463254894438%29&wt=json
> > does not seems to return any record , can we query on the _version_field
> in
> > the schema.xml ?
>
> I've been watching your journey unfold on the mailing list.  The whole
> thing seems like an XY problem.
>
> If I'm reading everything correctly, you want to have a unique ID value
> that can serve as the uniqueKey, as well as a way to quickly look up a
> single document in Solr.
>
> Is there one part of the URL that serves as a unique identifier that
> doesn't contain special characters?  It seems insane that you would not
> have a unique ID value for every entity in your system that is composed
> of only "regular" characters.
>
> Assuming that such an ID exists (and is likely used as one piece of that
> doctorURL that you mentioned) ... if you can extract that ID value into
> its own field (either in your indexing code or a custom update
> processor), you could use that for both uniqueKey and single-document
> lookups.  Having that kind of information in your index seems like a
> generally good idea.
>
> Thanks,
> Shawn
>
>


Re: Can we query on _version_field ?

2014-11-13 Thread S.L
Garth and Erick,

I am now successfully able to auto generate ids using UUID
updateRequestProcessorChain , by giving the id type of string .

Thanks for your help folks.

On Thu, Nov 13, 2014 at 1:31 PM, Garth Grimm <
garthgr...@averyranchconsulting.com> wrote:

> So it sounds like you’re OK with using the docURL as the unique key for
> routing in SolrCloud, but you don’t want to use it as a lookup mechanism.
>
> If you don’t want to do a hash of it and use that unique value in a second
> unique field and feed time,
> and you can’t seem to find any other field that might be unique,
> and you don’t want to make your own UpdateRequestProcessorChain that would
> generate a unique field from your unique key (such as by doing an MD5 hash),
> you might look at the UpdateRequestProcessorChain named “deduce” in the
> OOB solrconfig.xml.  It’s primarily designed to help dedupe results, but
> it’s technique is to concatenate multiple fields together to create a
> signature that will be unique in some way.  So instead of having to find
> one field in your data that’s unique, you could look for a couple of fields
> that, if combined, would create a unique field, and configure the “dedupe”
> Processor to handle that.
>
>
> > On Nov 13, 2014, at 12:02 PM, S.L  wrote:
> >
> > I am not sure if this a case of XY problem.
> >
> > I have no control over the URLs to deduce an id from them , those are
> from
> > www, I made the URL the uniqueKey , that way the document gets replaced
> > when a new document with that URL comes in .
> >
> > To do the detail look up I can either use the same  as it is , or
> > try and generate a unique id filed for each document.
> >
> > For the later option UUID is not behaving as expected in SolrCloud and
> > _version_ field seems to be serving the need .
> >
> > On Thu, Nov 13, 2014 at 11:35 AM, Shawn Heisey 
> wrote:
> >
> >> On 11/12/2014 10:45 PM, S.L wrote:
> >>> We know that _version_field is a mandatory field in solrcloud
> schema.xml,
> >>> it is expected to be of type long , it also seems to have unique value
> >> in a
> >>> collection.
> >>>
> >>> However the query of the form
> >>>
> >>
> http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*&fq=%28_version_:148463254894438%29&wt=json
> >>> does not seems to return any record , can we query on the
> _version_field
> >> in
> >>> the schema.xml ?
> >>
> >> I've been watching your journey unfold on the mailing list.  The whole
> >> thing seems like an XY problem.
> >>
> >> If I'm reading everything correctly, you want to have a unique ID value
> >> that can serve as the uniqueKey, as well as a way to quickly look up a
> >> single document in Solr.
> >>
> >> Is there one part of the URL that serves as a unique identifier that
> >> doesn't contain special characters?  It seems insane that you would not
> >> have a unique ID value for every entity in your system that is composed
> >> of only "regular" characters.
> >>
> >> Assuming that such an ID exists (and is likely used as one piece of that
> >> doctorURL that you mentioned) ... if you can extract that ID value into
> >> its own field (either in your indexing code or a custom update
> >> processor), you could use that for both uniqueKey and single-document
> >> lookups.  Having that kind of information in your index seems like a
> >> generally good idea.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>


Boosting the score using edismax for a non empty and non indexed field.

2014-12-07 Thread S.L
Hi All,

I have a situation where I need to boost the score of a query if a field
(imageURL) in the given document is non empty , I am using edismax so I
know that using bq parameter would solve the problem. However the field
imageURL that  I am trying to boost on is not indexed , meaning (stored =
true and indexed = false), can I use the bq parameter for a non indexed
field ? or should I be looking at re-indexing after changing the schema to
make this an indexed field ?

Also , my use case is such that I want the documents that have an imageURL
to be boosted so that they appear before those documents that do not have
the imageURL when sorted by score in a descending order, and this field in
question i.e. imageURL is sometimes present  and sometimes not, that is why
I am looking at boosting the score of those documents that have the
imageURL present.

Thanks and any help and suggestionis much appreciated!


Length norm not functioning in solr queries.

2014-12-08 Thread S.L
I have two documents doc1 and doc2 and each one of those has a field called
phoneName.

doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
Smartphone Factory Unlocked"

doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"

Here if I search for
q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true

Doc1 and Doc2 both have the same identical score , but since the field
phoneName in the doc2 has shorter length I would expect it to have a higher
score , but both have an identical score of 9.961212.

The phoneName filed is defined as follows.As we can see no where am I
specifying omitNorms=True, still the behavior seems to be that the length
norm is not functioning at all. Can some one let me know whats the issue
here ?

























Re: Boosting the score using edismax for a non empty and non indexed field.

2014-12-08 Thread S.L
Anyone ?

On Mon, Dec 8, 2014 at 2:45 AM, S.L  wrote:

> Hi All,
>
> I have a situation where I need to boost the score of a query if a field
> (imageURL) in the given document is non empty , I am using edismax so I
> know that using bq parameter would solve the problem. However the field
> imageURL that  I am trying to boost on is not indexed , meaning (stored =
> true and indexed = false), can I use the bq parameter for a non indexed
> field ? or should I be looking at re-indexing after changing the schema to
> make this an indexed field ?
>
> Also , my use case is such that I want the documents that have an imageURL
> to be boosted so that they appear before those documents that do not have
> the imageURL when sorted by score in a descending order, and this field in
> question i.e. imageURL is sometimes present  and sometimes not, that is why
> I am looking at boosting the score of those documents that have the
> imageURL present.
>
> Thanks and any help and suggestionis much appreciated!
>
>
>


Re: Length norm not functioning in solr queries.

2014-12-09 Thread S.L
Hi ,

Mikhail Thanks , I looked at the explain and this is what I see for the two
different documents in questions, they have identical scores   even though
the document 2 has a shorter productName field, I do not see any lenghtNorm
related information in the explain.

Also I am not exactly clear on what needs to be looked in the API ?

*Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
productName&ps=1&pf2= productName&pf3=
productName&stopwords=true&lowercaseOperators=true

*productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
Unlocked *


   - *100%* 10.649221 sum of the following:
  - *10.58%* 1.1270299 sum of the following:
 - *2.1%* 0.22383358 productName:iphon
 - *3.47%* 0.36922288 productName:"4 s"
 - *5.01%* 0.53397346 productName:"16 gb"
  - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
  - *27.79%* 2.959255 sum of the following:
 - *10.97%* 1.1680154 productName:"iphon 4 s"~1
 - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
  - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1


*productName Apple iPhone 4S 16GB for Net10, No Contract, White*


   - *100%* 10.649221 sum of the following:
  - *10.58%* 1.1270299 sum of the following:
 - *2.1%* 0.22383358 productName:iphon
 - *3.47%* 0.36922288 productName:"4 s"
 - *5.01%* 0.53397346 productName:"16 gb"
  - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
  - *27.79%* 2.959255 sum of the following:
 - *10.97%* 1.1680154 productName:"iphon 4 s"~1
 - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
  - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1




On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> It's worth to look into  to check particular scoring values. But
> for most suspect is the reducing precision when float norms are stored in
> byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
>
>
> On Mon, Dec 8, 2014 at 5:49 PM, S.L  wrote:
>
> > I have two documents doc1 and doc2 and each one of those has a field
> called
> > phoneName.
> >
> > doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> > Smartphone Factory Unlocked"
> >
> > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> >
> > Here if I search for
> >
> >
> q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true
> >
> > Doc1 and Doc2 both have the same identical score , but since the field
> > phoneName in the doc2 has shorter length I would expect it to have a
> higher
> > score , but both have an identical score of 9.961212.
> >
> > The phoneName filed is defined as follows.As we can see no where am I
> > specifying omitNorms=True, still the behavior seems to be that the length
> > norm is not functioning at all. Can some one let me know whats the issue
> > here ?
> >
> >  > stored="true" required="true" />
> >  > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> > 
> > 
> > 
> > 
> >  > words="lang/stopwords_en.txt"
> > enablePositionIncrements="true" />
> >  > generateWordParts="1" generateNumberParts="1"
> > catenateWords="1"
> > catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="1" />
> > 
> >  > protected="protwords.txt" />
> > 
> > 
> > 
> > 
> >  > synonyms="synonyms.txt"
> > ignoreCase="true" expand="true" />
> >  > words="lang/stopwords_en.txt"
> > enablePositionIncrements="true" />
> >  > generateWordParts="1" generateNumberParts="1"
> > catenateWords="0"
> > catenateNumbers="0" catenateAll="0"
> > splitOnCaseChange="1" />
> > 
> >  > protected="protwords.txt" />
> > 
> > 
> > 
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Re: Length norm not functioning in solr queries.

2014-12-10 Thread S.L
Hi Ahmet,

Is there already an implementation of the suggested work around ? Thanks.

On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
wrote:

> Hi,
>
> Default length norm is not best option for differentiating very short
> documents, like product names.
> Please see :
> http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
>
> I suggest you to create an additional integer field, that holds number of
> tokens. You can populate it via update processor. And then penalise (using
> fuction queries) according to that field. This way you have more fine
> grained and flexible control over it.
>
> Ahmet
>
>
>
> On Tuesday, December 9, 2014 12:22 PM, S.L 
> wrote:
> Hi ,
>
> Mikhail Thanks , I looked at the explain and this is what I see for the two
> different documents in questions, they have identical scores   even though
> the document 2 has a shorter productName field, I do not see any lenghtNorm
> related information in the explain.
>
> Also I am not exactly clear on what needs to be looked in the API ?
>
> *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> productName&ps=1&pf2= productName&pf3=
> productName&stopwords=true&lowercaseOperators=true
>
> *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> Unlocked *
>
>
>- *100%* 10.649221 sum of the following:
>   - *10.58%* 1.1270299 sum of the following:
>  - *2.1%* 0.22383358 productName:iphon
>  - *3.47%* 0.36922288 productName:"4 s"
>  - *5.01%* 0.53397346 productName:"16 gb"
>   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
>   - *27.79%* 2.959255 sum of the following:
>  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
>  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
>   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
>
>
> *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
>
>
>- *100%* 10.649221 sum of the following:
>   - *10.58%* 1.1270299 sum of the following:
>  - *2.1%* 0.22383358 productName:iphon
>  - *3.47%* 0.36922288 productName:"4 s"
>  - *5.01%* 0.53397346 productName:"16 gb"
>   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
>   - *27.79%* 2.959255 sum of the following:
>  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
>  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
>   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
>
>
>
>
>
> On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
> > It's worth to look into  to check particular scoring values. But
> > for most suspect is the reducing precision when float norms are stored in
> > byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
> >
> >
> > On Mon, Dec 8, 2014 at 5:49 PM, S.L  wrote:
> >
> > > I have two documents doc1 and doc2 and each one of those has a field
> > called
> > > phoneName.
> > >
> > > doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> > > Smartphone Factory Unlocked"
> > >
> > > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > >
> > > Here if I search for
> > >
> > >
> >
> q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true
> > >
> > > Doc1 and Doc2 both have the same identical score , but since the field
> > > phoneName in the doc2 has shorter length I would expect it to have a
> > higher
> > > score , but both have an identical score of 9.961212.
> > >
> > > The phoneName filed is defined as follows.As we can see no where am I
> > > specifying omitNorms=True, still the behavior seems to be that the
> length
> > > norm is not functioning at all. Can some one let me know whats the
> issue
> > > here ?
> > >
> > >  > > stored="true" required="true" />
> > >  > > positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
> > > 
> > > 
> > > 
> > > 
> > >  ignoreCase="true"
> > > words="lang/stopwords_en.txt"
> > > enablePositionIncrem

Re: Length norm not functioning in solr queries.

2014-12-11 Thread S.L
Mikhail,

Thank you for confirming this , however Ahmet's proposal seems more simpler
to implement to me .

On Wed, Dec 10, 2014 at 5:07 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:
>
> S.L,
>
> I briefly skimmed Lucene50NormsConsumer.writeNormsField(), my conclusion
> is: if you supply own similarity, which just avoids putting float to byte
> in Similarity.computeNorm(FieldInvertState), you get right this value in .
> Similarity.decodeNormValue(long).
> You may wonder but this is what's exactly done in PreciseDefaultSimilarity
> in TestLongNormValueSource. I think you can just use it.
>
> On Wed, Dec 10, 2014 at 12:11 PM, S.L  wrote:
>
> > Hi Ahmet,
> >
> > Is there already an implementation of the suggested work around ? Thanks.
> >
> > On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> > wrote:
> >
> > > Hi,
> > >
> > > Default length norm is not best option for differentiating very short
> > > documents, like product names.
> > > Please see :
> > > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> > >
> > > I suggest you to create an additional integer field, that holds number
> of
> > > tokens. You can populate it via update processor. And then penalise
> > (using
> > > fuction queries) according to that field. This way you have more fine
> > > grained and flexible control over it.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > > wrote:
> > > Hi ,
> > >
> > > Mikhail Thanks , I looked at the explain and this is what I see for the
> > two
> > > different documents in questions, they have identical scores   even
> > though
> > > the document 2 has a shorter productName field, I do not see any
> > lenghtNorm
> > > related information in the explain.
> > >
> > > Also I am not exactly clear on what needs to be looked in the API ?
> > >
> > > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > > productName&ps=1&pf2= productName&pf3=
> > > productName&stopwords=true&lowercaseOperators=true
> > >
> > > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > > Unlocked *
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > > mkhlud...@griddynamics.com> wrote:
> > >
> > > > It's worth to look into  to check particular scoring values.
> > But
> > > > for most suspect is the reducing precision when float norms are
> stored
> > in
> > > > byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
> > > >
> > > >
> > > > On Mon, Dec 8, 2014 at 5:49 PM, S.L 
> wrote:
> > > >
> > > > > I have two documents doc1 and doc2 and each one of those has a
> field
> > > > called
> > > > > phoneName.
> > 

Re: Length norm not functioning in solr queries.

2014-12-11 Thread S.L
Ahmet,

Thank you , as the configurations in SolrCloud are uploaded to zookeeper ,
are there any special steps that need to be taken to make this work in
SolrCloud ?

On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan 
wrote:
>
> Hi,
>
> Or even better, you can use your new field for tie break purposes. Where
> scores are identical.
> e.g. sort=score desc, wordCount asc
>
> Ahmet
>
>
> On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan 
> wrote:
> Hi,
>
> You mean update processor factory?
>
> Here is augmented (wordCount field added) version of your example :
>
> doc1:
>
> phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> Smartphone Factory Unlocked"
> wordCount: 11
>
> doc2:
>
> phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> wordCount: 9
>
>
> First task is simply calculate wordCount values. You can do it in your
> indexing code, or other places.
> I quickly skimmed existing update processors but I couldn't find stock
> implementation.
> CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is
> all about multivalued fields.
>
> I guess, A simple javascript that splits on whitespace and returns the
> produced array size would do the trick :
> StatelessScriptUpdateProcessorFactory
>
>
>
> At this point you have a int field named word count.
> boost=div(1,wordCount) should work. Or you can came up with more
> sophisticated math formula.
>
> Ahmet
>
>
> On Wednesday, December 10, 2014 11:12 AM, S.L 
> wrote:
> Hi Ahmet,
>
> Is there already an implementation of the suggested work around ? Thanks.
>
>
> On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Default length norm is not best option for differentiating very short
> > documents, like product names.
> > Please see :
> > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> >
> > I suggest you to create an additional integer field, that holds number of
> > tokens. You can populate it via update processor. And then penalise
> (using
> > fuction queries) according to that field. This way you have more fine
> > grained and flexible control over it.
> >
> > Ahmet
> >
> >
> >
> > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > wrote:
> > Hi ,
> >
> > Mikhail Thanks , I looked at the explain and this is what I see for the
> two
> > different documents in questions, they have identical scores   even
> though
> > the document 2 has a shorter productName field, I do not see any
> lenghtNorm
> > related information in the explain.
> >
> > Also I am not exactly clear on what needs to be looked in the API ?
> >
> > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > productName&ps=1&pf2= productName&pf3=
> > productName&stopwords=true&lowercaseOperators=true
> >
> > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > Unlocked *
> >
> >
> >- *100%* 10.649221 sum of the following:
> >   - *10.58%* 1.1270299 sum of the following:
> >  - *2.1%* 0.22383358 productName:iphon
> >  - *3.47%* 0.36922288 productName:"4 s"
> >  - *5.01%* 0.53397346 productName:"16 gb"
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >   - *27.79%* 2.959255 sum of the following:
> >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >
> >
> > *productName Apple iPhone 4S 16GB for Net10, No Contract, White*
> >
> >
> >- *100%* 10.649221 sum of the following:
> >   - *10.58%* 1.1270299 sum of the following:
> >  - *2.1%* 0.22383358 productName:iphon
> >  - *3.47%* 0.36922288 productName:"4 s"
> >  - *5.01%* 0.53397346 productName:"16 gb"
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >   - *27.79%* 2.959255 sum of the following:
> >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> >  - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
> >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> >
> >
> >
> >
> >
> > On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > It's worth to look 

Re: Length norm not functioning in solr queries.

2014-12-11 Thread S.L
Yes, I understand that reindexing is neccesary , however for some reason I
was not able to invoke the js script from the updateprocessor, so I ended
up using Java only solution at index time.

Thanks.

On Thu, Dec 11, 2014 at 7:18 AM, Ahmet Arslan 
wrote:
>
> Hi,
>
> No special steps to be taken for cloud setup. Please note that for both
> solutions, re-index is mandatory.
>
> Ahmet
>
>
>
> On Thursday, December 11, 2014 12:15 PM, S.L 
> wrote:
> Ahmet,
>
> Thank you , as the configurations in SolrCloud are uploaded to zookeeper ,
> are there any special steps that need to be taken to make this work in
> SolrCloud ?
>
>
> On Wed, Dec 10, 2014 at 4:32 AM, Ahmet Arslan 
> wrote:
> >
> > Hi,
> >
> > Or even better, you can use your new field for tie break purposes. Where
> > scores are identical.
> > e.g. sort=score desc, wordCount asc
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:29 AM, Ahmet Arslan <
> iori...@yahoo.com>
> > wrote:
> > Hi,
> >
> > You mean update processor factory?
> >
> > Here is augmented (wordCount field added) version of your example :
> >
> > doc1:
> >
> > phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> > Smartphone Factory Unlocked"
> > wordCount: 11
> >
> > doc2:
> >
> > phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> > wordCount: 9
> >
> >
> > First task is simply calculate wordCount values. You can do it in your
> > indexing code, or other places.
> > I quickly skimmed existing update processors but I couldn't find stock
> > implementation.
> > CountFieldValuesUpdateProcessorFactory fooled me, but it looks like it is
> > all about multivalued fields.
> >
> > I guess, A simple javascript that splits on whitespace and returns the
> > produced array size would do the trick :
> > StatelessScriptUpdateProcessorFactory
> >
> >
> >
> > At this point you have a int field named word count.
> > boost=div(1,wordCount) should work. Or you can came up with more
> > sophisticated math formula.
> >
> > Ahmet
> >
> >
> > On Wednesday, December 10, 2014 11:12 AM, S.L  >
> > wrote:
> > Hi Ahmet,
> >
> > Is there already an implementation of the suggested work around ? Thanks.
> >
> >
> > On Tue, Dec 9, 2014 at 6:41 AM, Ahmet Arslan 
> > wrote:
> >
> > > Hi,
> > >
> > > Default length norm is not best option for differentiating very short
> > > documents, like product names.
> > > Please see :
> > > http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec
> > >
> > > I suggest you to create an additional integer field, that holds number
> of
> > > tokens. You can populate it via update processor. And then penalise
> > (using
> > > fuction queries) according to that field. This way you have more fine
> > > grained and flexible control over it.
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Tuesday, December 9, 2014 12:22 PM, S.L 
> > > wrote:
> > > Hi ,
> > >
> > > Mikhail Thanks , I looked at the explain and this is what I see for the
> > two
> > > different documents in questions, they have identical scores   even
> > though
> > > the document 2 has a shorter productName field, I do not see any
> > lenghtNorm
> > > related information in the explain.
> > >
> > > Also I am not exactly clear on what needs to be looked in the API ?
> > >
> > > *Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
> > > productName&ps=1&pf2= productName&pf3=
> > > productName&stopwords=true&lowercaseOperators=true
> > >
> > > *productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
> > > Unlocked *
> > >
> > >
> > >- *100%* 10.649221 sum of the following:
> > >   - *10.58%* 1.1270299 sum of the following:
> > >  - *2.1%* 0.22383358 productName:iphon
> > >  - *3.47%* 0.36922288 productName:"4 s"
> > >  - *5.01%* 0.53397346 productName:"16 gb"
> > >   - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
> > >   - *27.79%* 2.959255 sum of the following:
> > >  - *10.97%* 1.1680154 productName:"iphon 4 s"~1
> > >  - *16.82%* 1.7912396 produ

'Illegal character in query' on Solr cloud 4.10.1

2014-12-23 Thread S.L
Hi All,

I am using SolrCloud 4.10.1 and I have 3 shards with replication factor of
2 , i.e is 6 nodes altogether.

When I query the server1 out of 6 nodes in the cluster with the below query
, it works fine , but any other node in the cluster when queried with the
same query results in a *HTTP Status 500 - {msg=Illegal character in query
at index 181:*
error.

The character at index 181 is the boost character ^. I have see a Jira
SOLR-5971  for a similar
issue , how can I overcome this issue.

The query I use is below. Thanks in Advance!

http://xx2..com:8081/solr/dyCollection1_shard2_replica1/?q=x+x+xx&sort=score+desc&wt=json&indent=true&debugQuery=true&defType=edismax&qf=productName
^1.5+productDescription&mm=1&pf=productName+productDescription&ps=1&pf2=productName+productDescription&pf3=productName+productDescription&stopwords=true&lowercaseOperators=true


Re: 'Illegal character in query' on Solr cloud 4.10.1

2014-12-24 Thread S.L
Erik,

The scenario 1, that you have listed is what seems to be the case.

When I add distrib=false to query each one of the 6 servers only 1 of them
returns results (partial) and the rest of them give the illegal character
error .

I have not set up any special logging I do not see any info in the
catalina.out but in a file called localhost_access_log.2014-12-24.txt in
tomcat/logs directory, I see the following logging message when the invalid
character error occurs.

[24/Dec/2014:09:25:54 +] "GET
/solr/dyCollection1_shard2_replica1/?fl=*,score&q=canon+pixma+printer&sort=score+desc,productNameLength%20asc&wt=json&indent=true&rows=100&defType=edismax&qf=productName&mm=2&pf=productName&ps=1&pf2=productName&pf3=productName&stopwords=true&lowercaseOperators=true&bq=hasThumbnailImage:true^2.0&distrib=false
HTTP/1.1" 500 7781

I am using Tomcat 7.0.42 and SolrCloud 4.10.1 and the Oracle JDK .

java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

Thanks.

On Tue, Dec 23, 2014 at 11:46 AM, Erick Erickson 
wrote:

> Hmmm, so you are you pinging the servers directly, right?
> Here's a couple of things to try:
> 1> add &distrib=false to the query and try each of the 6 servers.
> What I'm wondering is if this is happening on the sub-query sent
> out or on the primary server. Adding &distrib=false will just execute
> on the node you're sending it to, and will NOT send sub-queries out
> to any other node so you'll get partial results back.
>
> If one server continues to work but the other 5 fail, then your servlet
> container is probably not set up with the right character sets. Although
> why that would manifest itself on the ^ character mystifies me.
>
> 2> Let's assume that all 6 servers handle the raw query. Next thing that
> would be really helpful is to see the sub-queries. Take &distrib=false
> off and tail the logs on all the servers. What we're looking for here is
> whether the sub-queries even make it to Solr or whether the problem
> is in your container.
>
> 3> If the sub-queries do NOT make it to the Solr logs, what is the query
> that the container sees? Is it recognizable or has Solr somehow munged
> the sub-query?
>
> What is your environment like? Tomcat? Jetty? Other? What JVM
> etc?
>
> Best,
> Erick
>
> On Tue, Dec 23, 2014 at 3:23 AM, S.L  wrote:
> > Hi All,
> >
> > I am using SolrCloud 4.10.1 and I have 3 shards with replication factor
> of
> > 2 , i.e is 6 nodes altogether.
> >
> > When I query the server1 out of 6 nodes in the cluster with the below
> query
> > , it works fine , but any other node in the cluster when queried with the
> > same query results in a *HTTP Status 500 - {msg=Illegal character in
> query
> > at index 181:*
> > error.
> >
> > The character at index 181 is the boost character ^. I have see a Jira
> > SOLR-5971 <https://issues.apache.org/jira/browse/SOLR-5971> for a
> similar
> > issue , how can I overcome this issue.
> >
> > The query I use is below. Thanks in Advance!
> >
> >
> http://xx2..com:8081/solr/dyCollection1_shard2_replica1/?q=x+x+xx&sort=score+desc&wt=json&indent=true&debugQuery=true&defType=edismax&qf=productName
> >
> ^1.5+productDescription&mm=1&pf=productName+productDescription&ps=1&pf2=productName+productDescription&pf3=productName+productDescription&stopwords=true&lowercaseOperators=true
>


Re: 'Illegal character in query' on Solr cloud 4.10.1

2014-12-25 Thread S.L
Jack,

I am using this query to test from the browser and this occurs consistently
for the 5 out of the 6 servers in the cluster, but the actual API that I
use is pysolr, so from the front end its sent using pysolr.

I face the same issue in both Firefox and Google Chrome, the fact that
there is an existing Jira for a similar issue , made me think this is a
Solr issue , but I am still not clear how I can circumvent this issue.





On Wed, Dec 24, 2014 at 4:57 PM, Jack Krupansky 
wrote:

> Is the problem here that the error occurs sometimes or that it doesn't
> occur all of the time? I mean, it is clearly a bug in the client if it is
> sending a raw circumflex rather than a URL-encoded circumflex.
>
> Also, some browsers automatically URL-encode character as needed, but I
> have heard that some browsers don't always encode all of the characters.
>
> Question: You mention the URL, but how are you sending that URL to Solr -
> via a browser address box, curl, or... what?
>
> If using curl, you also have to cope with some characters having a shell
> meaning and needing to be escaped.
>
> Whether it is Tomcat or Solr that gives the error, the main point is that
> the raw circumflex shouldn't be sent to either.
>
>
> -- Jack Krupansky
>
> On Wed, Dec 24, 2014 at 4:32 PM, Erick Erickson 
> wrote:
>
> > OK, then I don't think it's a Solr problem. I think 5 of your Tomcats are
> > configured in such a way that they consider ^ to be an illegal character.
> >
> > There have been recurring problems with Servlet containers being
> > configured to allow/disallow various characters, and I think that's
> > what's happening here. But this is totally outside Solr.
> >
> > Solr, when it successfully distributes a query, sends the query on to one
> > replica of each shard, and I was wondering if that process wasn't
> > working correctly somehow, although boosting is so common that it
> > would be a huge shock since it would have broken almost every
> > Tomcat installation out there. By sending the query directly to each
> > node, you've bypassed any forwarding by Solr so it looks like the
> > problem is before Solr even sees it.
> >
> > So my guess is that somehow 5 of your servers are configured to
> > expect a different character than the server that works. I'm afraid
> > I don't know Tomcat well enough to direct you there, but take a
> > look here:
> > https://wiki.apache.org/solr/SolrTomcat
> >
> > Sorry I can't be more help
> > Erick
> >
> > On Wed, Dec 24, 2014 at 1:33 AM, S.L  wrote:
> > > Erik,
> > >
> > > The scenario 1, that you have listed is what seems to be the case.
> > >
> > > When I add distrib=false to query each one of the 6 servers only 1 of
> > them
> > > returns results (partial) and the rest of them give the illegal
> character
> > > error .
> > >
> > > I have not set up any special logging I do not see any info in the
> > > catalina.out but in a file called localhost_access_log.2014-12-24.txt
> in
> > > tomcat/logs directory, I see the following logging message when the
> > invalid
> > > character error occurs.
> > >
> > > [24/Dec/2014:09:25:54 +] "GET
> > >
> >
> /solr/dyCollection1_shard2_replica1/?fl=*,score&q=canon+pixma+printer&sort=score+desc,productNameLength%20asc&wt=json&indent=true&rows=100&defType=edismax&qf=productName&mm=2&pf=productName&ps=1&pf2=productName&pf3=productName&stopwords=true&lowercaseOperators=true&bq=hasThumbnailImage:true^2.0&distrib=false
> > > HTTP/1.1" 500 7781
> > >
> > > I am using Tomcat 7.0.42 and SolrCloud 4.10.1 and the Oracle JDK .
> > >
> > > java version "1.7.0_71"
> > > Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
> > > Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
> > >
> > > Thanks.
> > >
> > > On Tue, Dec 23, 2014 at 11:46 AM, Erick Erickson <
> > erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Hmmm, so you are you pinging the servers directly, right?
> > >> Here's a couple of things to try:
> > >> 1> add &distrib=false to the query and try each of the 6 servers.
> > >> What I'm wondering is if this is happening on the sub-query sent
> > >> out or on the primary server. Adding &distrib=false will just execute
> > >> on the node you're sending it to, and will NOT send sub-querie

distrib=false

2014-12-27 Thread S.L
Hi All,

I have a question regarding distrib=false on the Solr query , it seems that
the distribution is restricted across only the shards  when the parameter
is set to false, meaning if I query a particular node with in a shard with
replication factor of more than one  , the request could go to another node
with in the same shard which is a replica of the node that I made the
initial request to, is my understanding correct ?

If the answer to my question is yes, then how do we make sure that the
request goes to only the node I intend to make the request to  ?

Thanks.


Re: distrib=false

2014-12-28 Thread S.L
Erik

I have attached the screen shot of the toplogy , as you can see I have
three nodes and no two replicas of the same shard reside on the same node,
this was made sure so as not affect the availability.

The query that I use is a general get all query of type *:* to test .

The behavior I notice is that even though when a particular replica of a
shard is queried using distrib=false , the request goes to the other
replica of the same shard.

Thanks.

On Sat, Dec 27, 2014 at 2:10 PM, Erick Erickson 
wrote:

> How are you sending the request? AFAIK, setting distrib=false
> should should keep the query from being sent to any other node,
> although I'm not quite sure what happens when you host multiple
> replicas of the _same_ shard on the same node.
>
> So we need:
> 1> your topology, How many nodes and what replicas on each?
> 2> the actual query you send.
>
> Best,
> Erick
>
> On Sat, Dec 27, 2014 at 8:14 AM, S.L  wrote:
> > Hi All,
> >
> > I have a question regarding distrib=false on the Solr query , it seems
> that
> > the distribution is restricted across only the shards  when the parameter
> > is set to false, meaning if I query a particular node with in a shard
> with
> > replication factor of more than one  , the request could go to another
> node
> > with in the same shard which is a replica of the node that I made the
> > initial request to, is my understanding correct ?
> >
> > If the answer to my question is yes, then how do we make sure that the
> > request goes to only the node I intend to make the request to  ?
> >
> > Thanks.
>


How to implement multi-set in a Solr schema.

2014-12-28 Thread S.L
Hi All,

I have a use case where I need to group documents that have a same field
called bookName , meaning if there are a multiple documents with the same
bookName value and if the user input is searched by a query on  bookName ,
I need to be able to group all the documents by the same bookName together,
so that I could display them as a group in the UI.

What kind of support does Solr provide for such a scenario , and how should
I look at changing my schema.xml which as bookName as single valued text
field ?

Thanks.


DocExpirationUpdateProcessorFactory not deleting records

2015-02-04 Thread S.L
I am trying to use the DocExpirationUpdateProcessorFactoryfactory in Solr
4.10.1 version.

I have included the following in my solrconfig.xml



id



timestamp


30
ttl
expire_at





And I have included the following in my schema.xml





As you can see I am setting the time to live to be 60 seconds and checking
to delete every 30 seconds, when I insert a document , and check after a
minute or couple or an hour it never gets deleted.

This is  what I see in the indexed document , can you please let me know
what might be the issue here ? Please note that the expire_at field is
never getting generated in the Solr document as can be seen below.



"id": "3888a8ac-fbc4-437a-8248-132384753c00",
"timestamp": "2015-02-04T04:09:21.29Z",
"_version_": 1492147724740460500,
"ttl": "2015-02-04T04:10:21.29Z"


Re: DocExpirationUpdateProcessorFactory not deleting records

2015-02-04 Thread S.L
Thanks for giving multiple options , I ll try them out both ,but last time
I checked, having "+60SECONDS" as the default value for ttl was giving me
an invalid date format exception, I am assuming that would only be the
case if I use it with the default mechanism in schema.xml, but not when we
use the solr.DefaultValueUpdateProcessorFactory ?



On Wed, Feb 4, 2015 at 1:56 PM, Chris Hostetter 
wrote:

>
> :  : class="solr.processor.DocExpirationUpdateProcessorFactory">
> : 30
> : ttl
> : expire_at
> : 
>
> ...
>
> : And I have included the following in my schema.xml
> :
> :  : default="NOW+60SECONDS" multiValued="false"/>
>
> there are a couple of problems here...
>
> : As you can see I am setting the time to live to be 60 seconds and
> checking
> : to delete every 30 seconds, when I insert a document , and check after a
> : minute or couple or an hour it never gets deleted.
>
> first off: you aren't actaully setting the ttl to "60 seconds" you are
> setting the ttl to be a fixed moment in time which is 60 seconds from when
> the doc is written to the index -- basically you are eliminating hte need
> for having a ttl field/param at all and saying "this is *exactly* when the
> document should expire".
>
> if that's what you want to do, just elimintae the ttleFieldName everywhere
> in your schema.xml and solrconfig.xml and setup expire_at in your
> schema.xml with a default="NOW+60SECONDS" and you'll probably be good to
> go.
>
> second...
>
> : what might be the issue here ? Please note that the expire_at field is
> : never getting generated in the Solr document as can be seen below.
>
> ...even if you redefined your ttl field to look like this...
>
>   
>
> ...the expire_at still wouldn't be populated by the processor because
> schema field "default" values are populated *after* the processors run --
> so when the DocExpirationUpdateProcessorFactory sees the documents being
> added, it has no idea that they all have a default ttl, so it doesn't know
> that you want it to compute an expire_at for you.
>
> instead of using default="" in the schema, you can use the
> DefaultValueUpdateProcessorFactory to assign it *before* the
> DocExpirationUpdateProcessorFactory sees the doc...
>
>  
>ttl
>+60SECONDS
>  
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: DocExpirationUpdateProcessorFactory not deleting records

2015-02-04 Thread S.L
Great, this is the first example I have seen so far, I wish we could
include this in the Wiki. Thanks again!

On Wed, Feb 4, 2015 at 2:04 PM, Chris Hostetter 
wrote:

> :
> : Thanks for giving multiple options , I ll try them out both ,but last
> time
> : I checked, having "+60SECONDS" as the default value for ttl was giving me
> : an invalid date format exception, I am assuming that would only be
> the
>
> that's because ttl should not be a date field -- it should be a *string*
> (as noted in my examples)
>
> "time to live" is a date math expression that the processor will evaluate
> for you -- not a date.  if you want to specify an explicit date, just set
> expire_at directly.
>
> ie: do you wnat to do the match yourself (set expire_at as a date field)
> or do you want the processor to do the math itself (set ttl as a string
> field)
>
> : > ...even if you redefined your ttl field to look like this...
> : >
> : >   
> : >
> : > ...the expire_at still wouldn't be populated by the processor because
> : > schema field "default" values are populated *after* the processors run
> --
> : > so when the DocExpirationUpdateProcessorFactory sees the documents
> being
> : > added, it has no idea that they all have a default ttl, so it doesn't
> know
> : > that you want it to compute an expire_at for you.
> : >
> : > instead of using default="" in the schema, you can use the
> : > DefaultValueUpdateProcessorFactory to assign it *before* the
> : > DocExpirationUpdateProcessorFactory sees the doc...
> : >
> : >  
> : >ttl
> : >+60SECONDS
> : >  
>
>
> -Hoss
> http://www.lucidworks.com/
>


Trending functionality in Solr

2015-02-08 Thread S.L
Folks,

Is there a way to implement the trending functionality using Solr , to give
the results using a query for say the most searched terms in the past hours
or so , if the most searched terms is not possible is it possible to at
least the get results for the last 100 terms?

Thanks


Re: Trending functionality in Solr

2015-02-09 Thread S.L
Folks,

Thanks for this wealth of information , the consensus generally seems to be
that one should be able to save the queries in Solr core (another one) and
then times stamp it to do  further analysis . I will try and implement the
same .

Siegfried, I looked at your JIRA issue which is impressive but would be an
overkill in my situation so , I will implementing something simpler for use
in my case.

Thanks again everyone for this help.

On Mon, Feb 9, 2015 at 3:14 AM, Siegfried Goeschl  wrote:

> Hi folks,
>
> I implemented something similar but never got around to contribute it -
> see https://issues.apache.org/jira/browse/SOLR-4056
>
> The code was initially for SOLR3 but was recently ported to SOLR4
>
> * capturing the most frequent search terms per core
> * supports ad-hoc queries
> * CSV export
>
> If you are interested we could team up and make a proper SOLR contribution
> :-)
>
> Cheers,
>
> Siegfried Goeschl
>
>
> On 08.02.15 05:26, S.L wrote:
>
>> Folks,
>>
>> Is there a way to implement the trending functionality using Solr , to
>> give
>> the results using a query for say the most searched terms in the past
>> hours
>> or so , if the most searched terms is not possible is it possible to at
>> least the get results for the last 100 terms?
>>
>> Thanks
>>
>>
>


Re: [MASSMAIL]Re: Trending functionality in Solr

2015-02-09 Thread S.L
Thanks for stating this in a simple fashion.

On Sun, Feb 8, 2015 at 6:07 PM, Jorge Luis Betancourt González <
jlbetanco...@uci.cu> wrote:

> For a project I'm working on, what we do is store the user's query in a
> separated core that we also use to provide an autocomplete query
> functionality, so far, the frontend app is responsible of sending the query
> to Solr, meaning: 1. execute the query against our search core and 2. send
> an update request to store the query in the separated core. We use some
> deduplication (provided by Solr) to avoid storing the same query several
> times. We don't do what you're after but it would't be to hard to tag each
> query with a timestamp field and provide analytics. Thinking from the top
> of my head we could wrap this logic that is currently done in the frontend
> app in a custom SearchComponent that automatically send the search query
> into the other core for storing, abstracting all this logic from the client
> app. Keep in mind that the considerations regarding volume of data that
> Shawn has talked keeps being valid.
>
> Hope it helps,
>
> - Original Message -
> From: "Shawn Heisey" 
> To: solr-user@lucene.apache.org
> Sent: Sunday, February 8, 2015 11:03:33 AM
> Subject: [MASSMAIL]Re: Trending functionality in Solr
>
> On 2/7/2015 9:26 PM, S.L wrote:
> > Is there a way to implement the trending functionality using Solr , to
> give
> > the results using a query for say the most searched terms in the past
> hours
> > or so , if the most searched terms is not possible is it possible to at
> > least the get results for the last 100 terms?
>
> I'm reasonably sure that the only thing Solr has out of the box that can
> record queries is the logging feature that defaults to INFO.  That data
> is not directly available to Solr, and it's not in a good format for
> easy parsing.
>
> Queries are not stored anywhere else by Solr.  From what I understand,
> analysis is a relatively easy part of the equation, but the data must be
> available first, which is the hard part.  Storing it in RAM is pretty
> much a non-starter -- there are installations that see thousands of
> queries every second.
>
> This is an area for improvement, but the infrastructure must be written
> from scratch.  All work on this project is volunteer.  We are highly
> motivated volunteers, but extensive work like this is difficult to fit
> into donated time.
>
> Many people who use Solr are already recording all queries in some other
> system (like a database), so it is far easier to implement analysis on
> that data.
>
> Thanks,
> Shawn
>
>